Wrangling Data @dog_rates aka. WeRateDogs

Introduction

Real-world data rarely come clean. Using Python and its libraries, we will gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it. This is called data wrangling. We will document our wrangling efforts in a Jupyter Notebook, plus showcase them through analyses and visualizations using Python its libraries.

The dataset that we will be wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs, Brent". WeRateDogs has over 4 million followers and has received international media coverage.

Software that we will be used
Since we work in a local environment, the following libraries should be installed:

  • pandas
  • NumPy
  • requests
  • tweepy
  • json

Context
Goal: wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations.

The Data

  • Enhanced Twitter Archive

    The WeRateDogs Twitter archive contains basic tweet data for all 2356 of their tweets. Containing one column the archive does contain though: each tweet's text, which Udacity team has extracted the rating, dog name, and dog "stage" (i.e. doggo, floofer, pupper, and puppo) to make this Twitter archive "enhanced".

  • Additional Data via the Twitter API

    Then we need retweet count and favorite count are two of the notable column omissions. Fortunately, this additional data can be gathered by anyone from Twitter's API. Using this API we can extract needed data to make our dataset more concise.

  • Image Predictions File

    The Udacity team has run every image in the WeRateDogs Twitter archive through a neural network that can classify breeds of dogs. The results are so amazing: a table full of image predictions (the top three only) alongside each tweet ID, image URL, and the image number that corresponded to the most confident prediction.

Project Details

  • Data wrangling, which consists of:

    Gathering data
    Assessing data
    Cleaning data

  • Storing, analyzing, and visualizing your wrangled data
  • Reporting on:

    1) your data wrangling efforts and
    2) your data analyses and visualizations

Gather Data

  • The WeRateDogs Twitter archive.

    The archive data is downloaded manually from the Udacity lesson's page, then we will be inserted using Pandas libraries.

  • The tweet image predictions.

    This data is hosted on Udacity's servers and should be downloaded programmatically using the Requests library and the following URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv.

  • Each tweet's retweet count and favorite ("like") count at minimum, and any additional data may be interesting.

    For this data we will be using TwitterAPI and Tweepy library. Using the tweet IDs in the WeRateDogs Twitter archive, query the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file. Each tweet's JSON data should be written to its line. Then read this .txt file line by line into a pandas DataFrame with (at minimum) tweet ID, retweet count, and favorite count.


As usual, we need to import useful packages before doing anything in this project.

In [1]:
import os
import re
import json
import tweepy
import requests
import numpy as np
import pandas as pd
import seaborn as sns
from PIL import Image
from io import BytesIO
from tweepy import OAuthHandler
import matplotlib.pyplot as plt
from timeit import default_timer as timer

WeRateDogs Twitter archive

This was data in our hand right now.

In [2]:
archive_df = pd.read_csv('twitter-archive-enhanced.csv')
archive_df
Out[2]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
0 892420643555336193 NaN NaN 2017-08-01 16:23:56 +0000 <a href="http://twitter.com/download/iphone" r... This is Phineas. He's a mystical boy. Only eve... NaN NaN NaN https://twitter.com/dog_rates/status/892420643... 13 10 Phineas None None None None
1 892177421306343426 NaN NaN 2017-08-01 00:17:27 +0000 <a href="http://twitter.com/download/iphone" r... This is Tilly. She's just checking pup on you.... NaN NaN NaN https://twitter.com/dog_rates/status/892177421... 13 10 Tilly None None None None
2 891815181378084864 NaN NaN 2017-07-31 00:18:03 +0000 <a href="http://twitter.com/download/iphone" r... This is Archie. He is a rare Norwegian Pouncin... NaN NaN NaN https://twitter.com/dog_rates/status/891815181... 12 10 Archie None None None None
3 891689557279858688 NaN NaN 2017-07-30 15:58:51 +0000 <a href="http://twitter.com/download/iphone" r... This is Darla. She commenced a snooze mid meal... NaN NaN NaN https://twitter.com/dog_rates/status/891689557... 13 10 Darla None None None None
4 891327558926688256 NaN NaN 2017-07-29 16:00:24 +0000 <a href="http://twitter.com/download/iphone" r... This is Franklin. He would like you to stop ca... NaN NaN NaN https://twitter.com/dog_rates/status/891327558... 12 10 Franklin None None None None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2351 666049248165822465 NaN NaN 2015-11-16 00:24:50 +0000 <a href="http://twitter.com/download/iphone" r... Here we have a 1949 1st generation vulpix. Enj... NaN NaN NaN https://twitter.com/dog_rates/status/666049248... 5 10 None None None None None
2352 666044226329800704 NaN NaN 2015-11-16 00:04:52 +0000 <a href="http://twitter.com/download/iphone" r... This is a purebred Piers Morgan. Loves to Netf... NaN NaN NaN https://twitter.com/dog_rates/status/666044226... 6 10 a None None None None
2353 666033412701032449 NaN NaN 2015-11-15 23:21:54 +0000 <a href="http://twitter.com/download/iphone" r... Here is a very happy pup. Big fan of well-main... NaN NaN NaN https://twitter.com/dog_rates/status/666033412... 9 10 a None None None None
2354 666029285002620928 NaN NaN 2015-11-15 23:05:30 +0000 <a href="http://twitter.com/download/iphone" r... This is a western brown Mitsubishi terrier. Up... NaN NaN NaN https://twitter.com/dog_rates/status/666029285... 7 10 a None None None None
2355 666020888022790149 NaN NaN 2015-11-15 22:32:08 +0000 <a href="http://twitter.com/download/iphone" r... Here we have a Japanese Irish Setter. Lost eye... NaN NaN NaN https://twitter.com/dog_rates/status/666020888... 8 10 None None None None None

2356 rows × 17 columns

The tweet image predictions.

The tweet image predictions, i.e., what breed of dog (or other object, animal, etc.) is present in each tweet according to a neural network. This file (image_predictions.tsv) is hosted on Udacity's servers and should be downloaded programmatically.

In [3]:
url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'

r = requests.get(url)  
with open('image-predictions.tsv', 'wb') as f:
    f.write(r.content)
In [3]:
image_df = pd.read_csv('image-predictions.tsv', sep='\t')
image_df
Out[3]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg 1 Welsh_springer_spaniel 0.465074 True collie 0.156665 True Shetland_sheepdog 0.061428 True
1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg 1 redbone 0.506826 True miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True
2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg 1 German_shepherd 0.596461 True malinois 0.138584 True bloodhound 0.116197 True
3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg 1 Rhodesian_ridgeback 0.408143 True redbone 0.360687 True miniature_pinscher 0.222752 True
4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg 1 miniature_pinscher 0.560311 True Rottweiler 0.243682 True Doberman 0.154629 True
... ... ... ... ... ... ... ... ... ... ... ... ...
2070 891327558926688256 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg 2 basset 0.555712 True English_springer 0.225770 True German_short-haired_pointer 0.175219 True
2071 891689557279858688 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg 1 paper_towel 0.170278 False Labrador_retriever 0.168086 True spatula 0.040836 False
2072 891815181378084864 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg 1 Chihuahua 0.716012 True malamute 0.078253 True kelpie 0.031379 True
2073 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg 1 Chihuahua 0.323581 True Pekinese 0.090647 True papillon 0.068957 True
2074 892420643555336193 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg 1 orange 0.097049 False bagel 0.085851 False banana 0.076110 False

2075 rows × 12 columns

Tweet's retweet count and favorite ("like") count at minimum, and any additional data may be interesting.

Using the tweet IDs in the WeRateDogs Twitter archive, query the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file. Each tweet's JSON data should be written to its own line. Then read this .txt file line by line into a pandas DataFrame with (at minimum) tweet ID, retweet count, and favorite count. Note: do not include your Twitter API keys, secrets, and tokens in your project submission.

In [ ]:
# Query Twitter API for each tweet in the Twitter archive and save JSON in a text file
# These are hidden to comply with Twitter's API terms and conditions
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth, wait_on_rate_limit=True)

# NOTE TO STUDENT WITH MOBILE VERIFICATION ISSUES:
# df_1 is a DataFrame with the twitter_archive_enhanced.csv file. You may have to
# change line 17 to match the name of your DataFrame with twitter_archive_enhanced.csv
# NOTE TO REVIEWER: this student had mobile verification issues so the following
# Twitter API code was sent to this student from a Udacity instructor
# Tweet IDs for which to gather additional data via Twitter's API
tweet_ids = archive_df.tweet_id.values
len(tweet_ids)

# Query Twitter's API for JSON data for each tweet ID in the Twitter archive
count = 0
fails_dict = {}
start = timer()
# Save each tweet's returned JSON as a new line in a .txt file
with open('tweet_json.txt', 'w') as outfile:
    # This loop will likely take 20-30 minutes to run because of Twitter's rate limit
    for tweet_id in tweet_ids:
        count += 1
        print(str(count) + ": " + str(tweet_id))
        try:
            tweet = api.get_status(tweet_id, tweet_mode='extended')
            print("Success")
            json.dump(tweet._json, outfile)
            outfile.write('\n')
        except tweepy.TweepError as e:
            print("Fail")
            fails_dict[tweet_id] = e
            pass
end = timer()
print(end - start)
print(fails_dict)
In [4]:
tweepy_df = pd.read_json("tweet_json.txt", lines=True)
tweepy_df
Out[4]:
created_at id id_str full_text truncated display_text_range entities extended_entities source in_reply_to_status_id ... favorited retweeted possibly_sensitive possibly_sensitive_appealable lang retweeted_status quoted_status_id quoted_status_id_str quoted_status_permalink quoted_status
0 2017-08-01 16:23:56+00:00 892420643555336193 892420643555336192 This is Phineas. He's a mystical boy. Only eve... False [0, 85] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 892420639486877696, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
1 2017-08-01 00:17:27+00:00 892177421306343426 892177421306343424 This is Tilly. She's just checking pup on you.... False [0, 138] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 892177413194625024, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2 2017-07-31 00:18:03+00:00 891815181378084864 891815181378084864 This is Archie. He is a rare Norwegian Pouncin... False [0, 121] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 891815175371796480, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
3 2017-07-30 15:58:51+00:00 891689557279858688 891689557279858688 This is Darla. She commenced a snooze mid meal... False [0, 79] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 891689552724799489, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
4 2017-07-29 16:00:24+00:00 891327558926688256 891327558926688256 This is Franklin. He would like you to stop ca... False [0, 138] {'hashtags': [{'text': 'BarkWeek', 'indices': ... {'media': [{'id': 891327551943041024, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2317 2015-11-16 00:24:50+00:00 666049248165822465 666049248165822464 Here we have a 1949 1st generation vulpix. Enj... False [0, 120] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 666049244999131136, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2318 2015-11-16 00:04:52+00:00 666044226329800704 666044226329800704 This is a purebred Piers Morgan. Loves to Netf... False [0, 137] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 666044217047650304, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2319 2015-11-15 23:21:54+00:00 666033412701032449 666033412701032448 Here is a very happy pup. Big fan of well-main... False [0, 130] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 666033409081393153, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2320 2015-11-15 23:05:30+00:00 666029285002620928 666029285002620928 This is a western brown Mitsubishi terrier. Up... False [0, 139] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 666029276303482880, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2321 2015-11-15 22:32:08+00:00 666020888022790149 666020888022790144 Here we have a Japanese Irish Setter. Lost eye... False [0, 131] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 666020881337073664, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN

2322 rows × 32 columns


Assessing Data

In this step, we will be assessing them visually and programmatically for quality and tidiness issues using two types of assessment. We will be intensively using Pandas and its method, i.e:

  • .describe() to see the summary statistic
  • .info() to see the data types each column and detect missing data
  • .duplicates() to see if there is any duplicated row
  • we also using some loops to see the weird rating on the archive dataframe

Key Points
Key points in the data wrangling process for this project:

  • We want original ratings (no retweets) that have images.
  • Cleaning includes merging individual pieces of data according to the rules of tidy data.
  • The fact that the rating numerators are greater than the denominators does not need to be cleaned. This unique rating system is a big part of the popularity of WeRateDogs.

Archive Dataframe

In [5]:
archive_df
Out[5]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
0 892420643555336193 NaN NaN 2017-08-01 16:23:56 +0000 <a href="http://twitter.com/download/iphone" r... This is Phineas. He's a mystical boy. Only eve... NaN NaN NaN https://twitter.com/dog_rates/status/892420643... 13 10 Phineas None None None None
1 892177421306343426 NaN NaN 2017-08-01 00:17:27 +0000 <a href="http://twitter.com/download/iphone" r... This is Tilly. She's just checking pup on you.... NaN NaN NaN https://twitter.com/dog_rates/status/892177421... 13 10 Tilly None None None None
2 891815181378084864 NaN NaN 2017-07-31 00:18:03 +0000 <a href="http://twitter.com/download/iphone" r... This is Archie. He is a rare Norwegian Pouncin... NaN NaN NaN https://twitter.com/dog_rates/status/891815181... 12 10 Archie None None None None
3 891689557279858688 NaN NaN 2017-07-30 15:58:51 +0000 <a href="http://twitter.com/download/iphone" r... This is Darla. She commenced a snooze mid meal... NaN NaN NaN https://twitter.com/dog_rates/status/891689557... 13 10 Darla None None None None
4 891327558926688256 NaN NaN 2017-07-29 16:00:24 +0000 <a href="http://twitter.com/download/iphone" r... This is Franklin. He would like you to stop ca... NaN NaN NaN https://twitter.com/dog_rates/status/891327558... 12 10 Franklin None None None None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2351 666049248165822465 NaN NaN 2015-11-16 00:24:50 +0000 <a href="http://twitter.com/download/iphone" r... Here we have a 1949 1st generation vulpix. Enj... NaN NaN NaN https://twitter.com/dog_rates/status/666049248... 5 10 None None None None None
2352 666044226329800704 NaN NaN 2015-11-16 00:04:52 +0000 <a href="http://twitter.com/download/iphone" r... This is a purebred Piers Morgan. Loves to Netf... NaN NaN NaN https://twitter.com/dog_rates/status/666044226... 6 10 a None None None None
2353 666033412701032449 NaN NaN 2015-11-15 23:21:54 +0000 <a href="http://twitter.com/download/iphone" r... Here is a very happy pup. Big fan of well-main... NaN NaN NaN https://twitter.com/dog_rates/status/666033412... 9 10 a None None None None
2354 666029285002620928 NaN NaN 2015-11-15 23:05:30 +0000 <a href="http://twitter.com/download/iphone" r... This is a western brown Mitsubishi terrier. Up... NaN NaN NaN https://twitter.com/dog_rates/status/666029285... 7 10 a None None None None
2355 666020888022790149 NaN NaN 2015-11-15 22:32:08 +0000 <a href="http://twitter.com/download/iphone" r... Here we have a Japanese Irish Setter. Lost eye... NaN NaN NaN https://twitter.com/dog_rates/status/666020888... 8 10 None None None None None

2356 rows × 17 columns

In [6]:
archive_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   tweet_id                    2356 non-null   int64  
 1   in_reply_to_status_id       78 non-null     float64
 2   in_reply_to_user_id         78 non-null     float64
 3   timestamp                   2356 non-null   object 
 4   source                      2356 non-null   object 
 5   text                        2356 non-null   object 
 6   retweeted_status_id         181 non-null    float64
 7   retweeted_status_user_id    181 non-null    float64
 8   retweeted_status_timestamp  181 non-null    object 
 9   expanded_urls               2297 non-null   object 
 10  rating_numerator            2356 non-null   int64  
 11  rating_denominator          2356 non-null   int64  
 12  name                        2356 non-null   object 
 13  doggo                       2356 non-null   object 
 14  floofer                     2356 non-null   object 
 15  pupper                      2356 non-null   object 
 16  puppo                       2356 non-null   object 
dtypes: float64(4), int64(3), object(10)
memory usage: 313.0+ KB
In [7]:
archive_df.loc[archive_df['retweeted_status_id'].notnull()]
Out[7]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
19 888202515573088257 NaN NaN 2017-07-21 01:02:36 +0000 <a href="http://twitter.com/download/iphone" r... RT @dog_rates: This is Canela. She attempted s... 8.874740e+17 4.196984e+09 2017-07-19 00:47:34 +0000 https://twitter.com/dog_rates/status/887473957... 13 10 Canela None None None None
32 886054160059072513 NaN NaN 2017-07-15 02:45:48 +0000 <a href="http://twitter.com/download/iphone" r... RT @Athletics: 12/10 #BATP https://t.co/WxwJmv... 8.860537e+17 1.960740e+07 2017-07-15 02:44:07 +0000 https://twitter.com/dog_rates/status/886053434... 12 10 None None None None None
36 885311592912609280 NaN NaN 2017-07-13 01:35:06 +0000 <a href="http://twitter.com/download/iphone" r... RT @dog_rates: This is Lilly. She just paralle... 8.305833e+17 4.196984e+09 2017-02-12 01:04:29 +0000 https://twitter.com/dog_rates/status/830583320... 13 10 Lilly None None None None
68 879130579576475649 NaN NaN 2017-06-26 00:13:58 +0000 <a href="http://twitter.com/download/iphone" r... RT @dog_rates: This is Emmy. She was adopted t... 8.780576e+17 4.196984e+09 2017-06-23 01:10:23 +0000 https://twitter.com/dog_rates/status/878057613... 14 10 Emmy None None None None
73 878404777348136964 NaN NaN 2017-06-24 00:09:53 +0000 <a href="http://twitter.com/download/iphone" r... RT @dog_rates: Meet Shadow. In an attempt to r... 8.782815e+17 4.196984e+09 2017-06-23 16:00:04 +0000 https://www.gofundme.com/3yd6y1c,https://twitt... 13 10 Shadow None None None None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1023 746521445350707200 NaN NaN 2016-06-25 01:52:36 +0000 <a href="http://twitter.com/download/iphone" r... RT @dog_rates: This is Shaggy. He knows exactl... 6.678667e+17 4.196984e+09 2015-11-21 00:46:50 +0000 https://twitter.com/dog_rates/status/667866724... 10 10 Shaggy None None None None
1043 743835915802583040 NaN NaN 2016-06-17 16:01:16 +0000 <a href="http://twitter.com/download/iphone" r... RT @dog_rates: Extremely intelligent dog here.... 6.671383e+17 4.196984e+09 2015-11-19 00:32:12 +0000 https://twitter.com/dog_rates/status/667138269... 10 10 None None None None None
1242 711998809858043904 NaN NaN 2016-03-21 19:31:59 +0000 <a href="http://twitter.com/download/iphone" r... RT @twitter: @dog_rates Awesome Tweet! 12/10. ... 7.119983e+17 7.832140e+05 2016-03-21 19:29:52 +0000 https://twitter.com/twitter/status/71199827977... 12 10 None None None None None
2259 667550904950915073 NaN NaN 2015-11-20 03:51:52 +0000 <a href="http://twitter.com" rel="nofollow">Tw... RT @dogratingrating: Exceptional talent. Origi... 6.675487e+17 4.296832e+09 2015-11-20 03:43:06 +0000 https://twitter.com/dogratingrating/status/667... 12 10 None None None None None
2260 667550882905632768 NaN NaN 2015-11-20 03:51:47 +0000 <a href="http://twitter.com" rel="nofollow">Tw... RT @dogratingrating: Unoriginal idea. Blatant ... 6.675484e+17 4.296832e+09 2015-11-20 03:41:59 +0000 https://twitter.com/dogratingrating/status/667... 5 10 None None None None None

181 rows × 17 columns

In [8]:
archive_df.describe()
Out[8]:
tweet_id in_reply_to_status_id in_reply_to_user_id retweeted_status_id retweeted_status_user_id rating_numerator rating_denominator
count 2.356000e+03 7.800000e+01 7.800000e+01 1.810000e+02 1.810000e+02 2356.000000 2356.000000
mean 7.427716e+17 7.455079e+17 2.014171e+16 7.720400e+17 1.241698e+16 13.126486 10.455433
std 6.856705e+16 7.582492e+16 1.252797e+17 6.236928e+16 9.599254e+16 45.876648 6.745237
min 6.660209e+17 6.658147e+17 1.185634e+07 6.661041e+17 7.832140e+05 0.000000 0.000000
25% 6.783989e+17 6.757419e+17 3.086374e+08 7.186315e+17 4.196984e+09 10.000000 10.000000
50% 7.196279e+17 7.038708e+17 4.196984e+09 7.804657e+17 4.196984e+09 11.000000 10.000000
75% 7.993373e+17 8.257804e+17 4.196984e+09 8.203146e+17 4.196984e+09 12.000000 10.000000
max 8.924206e+17 8.862664e+17 8.405479e+17 8.874740e+17 7.874618e+17 1776.000000 170.000000
In [9]:
# check numerator value counts
archive_df.rating_numerator.value_counts()
Out[9]:
12      558
11      464
10      461
13      351
9       158
8       102
7        55
14       54
5        37
6        32
3        19
4        17
1         9
2         9
420       2
0         2
15        2
75        2
80        1
20        1
24        1
26        1
44        1
50        1
60        1
165       1
84        1
88        1
144       1
182       1
143       1
666       1
960       1
1776      1
17        1
27        1
45        1
99        1
121       1
204       1
Name: rating_numerator, dtype: int64
In [10]:
# check single numerator text value
single_numerator = archive_df.rating_numerator.value_counts().index[-22:]

single_numerator_index = []
for s in single_numerator:
    row = archive_df.index[archive_df['rating_numerator'] == s].to_list()
    single_numerator_index.append(row[0])

for s in single_numerator_index:
    print(s, "\t", archive_df['text'][s], "\t",
          archive_df['rating_numerator'][s])
1254 	 Here's a brigade of puppers. All look very prepared for whatever happens next. 80/80 https://t.co/0eb7R1Om12 	 80
1663 	 I'm aware that I could've said 20/16, but here at WeRateDogs we are very professional. An inconsistent rating scale is simply irresponsible 	 20
516 	 Meet Sam. She smiles 24/7 &amp; secretly aspires to be a reindeer. 
Keep Sam smiling by clicking and sharing this link:
https://t.co/98tB8y7y7t https://t.co/LouL5vdvxx 	 24
1712 	 Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD 	 26
1433 	 Happy Wednesday here's a bucket of pups. 44/40 would pet all at once https://t.co/HppvrYuamZ 	 44
1202 	 This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq 	 50
1351 	 Here is a whole flock of puppers.  60/50 I'll take the lot https://t.co/9dpcw6MdWa 	 60
902 	 Why does this never happen at my front door... 165/150 https://t.co/HmwrdfEfUE 	 165
433 	 The floofs have been released I repeat the floofs have been released. 84/70 https://t.co/NIYC820tmd 	 84
1843 	 Here we have an entire platoon of puppers. Total score: 88/80 would pet all at once https://t.co/y93p6FLvVw 	 88
1779 	 IT'S PUPPERGEDDON. Total of 144/120 ...I think https://t.co/ZanVtAtvIq 	 144
290 	 @markhoppus 182/10 	 182
1634 	 Two sneaky puppers were not initially seen, moving the rating to 143/130. Please forgive us. Thank you https://t.co/kRK51Y5ac3 	 143
189 	 @s8n You tried very hard to portray this good boy as not so good, but you have ultimately failed. His goodness shines through. 666/10 	 666
313 	 @jonnysun @Lin_Manuel ok jomny I know you're excited but 960/00 isn't a valid rating, 13/10 is tho 	 960
979 	 This is Atticus. He's quite simply America af. 1776/10 https://t.co/GRXwMxLBkh 	 1776
55 	 @roushfenway These are good dogs but 17/10 is an emotional impulse rating. More like 13/10s 	 17
763 	 This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq 	 27
1274 	 From left to right:
Cletus, Jerome, Alejandro, Burp, &amp; Titson
None know where camera is. 45/50 would hug all at once https://t.co/sedre1ivTK 	 45
1228 	 Happy Saturday here's 9 puppers on a bench. 99/90 good work everybody https://t.co/mpvaVxKmc1 	 99
1635 	 Someone help the girl is being mugged. Several are distracting her while two steal her shoes. Clever puppers 121/110 https://t.co/1zfnTJLt55 	 121
1120 	 Say hello to this unbelievably well behaved squad of doggos. 204/170 would try to pet all at once https://t.co/yGQI3He3xv 	 204
In [11]:
# check rating_denominator value counts
archive_df.rating_denominator.value_counts()
Out[11]:
10     2333
11        3
50        3
80        2
20        2
2         1
16        1
40        1
70        1
15        1
90        1
110       1
120       1
130       1
150       1
170       1
7         1
0         1
Name: rating_denominator, dtype: int64
In [12]:
# check single denominator text value

single_denominator = archive_df.rating_denominator.value_counts().index[5:]

single_denominator_index = []
for s in single_denominator:
    row = archive_df.index[archive_df['rating_denominator'] == s].to_list()
    single_denominator_index.append(row[0])

for s in single_denominator_index:
    print(s, "\t", archive_df['text'][s], "\t",
          archive_df['rating_denominator'][s])
2335 	 This is an Albanian 3 1/2 legged  Episcopalian. Loves well-polished hardwood flooring. Penis on the collar. 9/10 https://t.co/d9NcXFKwLv 	 2
1663 	 I'm aware that I could've said 20/16, but here at WeRateDogs we are very professional. An inconsistent rating scale is simply irresponsible 	 16
1433 	 Happy Wednesday here's a bucket of pups. 44/40 would pet all at once https://t.co/HppvrYuamZ 	 40
433 	 The floofs have been released I repeat the floofs have been released. 84/70 https://t.co/NIYC820tmd 	 70
342 	 @docmisterio account started on 11/15/15 	 15
1228 	 Happy Saturday here's 9 puppers on a bench. 99/90 good work everybody https://t.co/mpvaVxKmc1 	 90
1635 	 Someone help the girl is being mugged. Several are distracting her while two steal her shoes. Clever puppers 121/110 https://t.co/1zfnTJLt55 	 110
1779 	 IT'S PUPPERGEDDON. Total of 144/120 ...I think https://t.co/ZanVtAtvIq 	 120
1634 	 Two sneaky puppers were not initially seen, moving the rating to 143/130. Please forgive us. Thank you https://t.co/kRK51Y5ac3 	 130
902 	 Why does this never happen at my front door... 165/150 https://t.co/HmwrdfEfUE 	 150
1120 	 Say hello to this unbelievably well behaved squad of doggos. 204/170 would try to pet all at once https://t.co/yGQI3He3xv 	 170
516 	 Meet Sam. She smiles 24/7 &amp; secretly aspires to be a reindeer. 
Keep Sam smiling by clicking and sharing this link:
https://t.co/98tB8y7y7t https://t.co/LouL5vdvxx 	 7
313 	 @jonnysun @Lin_Manuel ok jomny I know you're excited but 960/00 isn't a valid rating, 13/10 is tho 	 0
In [13]:
archive_df[archive_df.duplicated()]
Out[13]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo

Image Dataframe

In [14]:
image_df
Out[14]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg 1 Welsh_springer_spaniel 0.465074 True collie 0.156665 True Shetland_sheepdog 0.061428 True
1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg 1 redbone 0.506826 True miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True
2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg 1 German_shepherd 0.596461 True malinois 0.138584 True bloodhound 0.116197 True
3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg 1 Rhodesian_ridgeback 0.408143 True redbone 0.360687 True miniature_pinscher 0.222752 True
4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg 1 miniature_pinscher 0.560311 True Rottweiler 0.243682 True Doberman 0.154629 True
... ... ... ... ... ... ... ... ... ... ... ... ...
2070 891327558926688256 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg 2 basset 0.555712 True English_springer 0.225770 True German_short-haired_pointer 0.175219 True
2071 891689557279858688 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg 1 paper_towel 0.170278 False Labrador_retriever 0.168086 True spatula 0.040836 False
2072 891815181378084864 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg 1 Chihuahua 0.716012 True malamute 0.078253 True kelpie 0.031379 True
2073 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg 1 Chihuahua 0.323581 True Pekinese 0.090647 True papillon 0.068957 True
2074 892420643555336193 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg 1 orange 0.097049 False bagel 0.085851 False banana 0.076110 False

2075 rows × 12 columns

In [15]:
image_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   tweet_id  2075 non-null   int64  
 1   jpg_url   2075 non-null   object 
 2   img_num   2075 non-null   int64  
 3   p1        2075 non-null   object 
 4   p1_conf   2075 non-null   float64
 5   p1_dog    2075 non-null   bool   
 6   p2        2075 non-null   object 
 7   p2_conf   2075 non-null   float64
 8   p2_dog    2075 non-null   bool   
 9   p3        2075 non-null   object 
 10  p3_conf   2075 non-null   float64
 11  p3_dog    2075 non-null   bool   
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB
In [16]:
image_df[image_df.jpg_url.duplicated()]
Out[16]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
1297 752309394570878976 https://pbs.twimg.com/ext_tw_video_thumb/67535... 1 upright 0.303415 False golden_retriever 0.181351 True Brittany_spaniel 0.162084 True
1315 754874841593970688 https://pbs.twimg.com/media/CWza7kpWcAAdYLc.jpg 1 pug 0.272205 True bull_mastiff 0.251530 True bath_towel 0.116806 False
1333 757729163776290825 https://pbs.twimg.com/media/CWyD2HGUYAQ1Xa7.jpg 2 cash_machine 0.802333 False schipperke 0.045519 True German_shepherd 0.023353 True
1345 759159934323924993 https://pbs.twimg.com/media/CU1zsMSUAAAS0qW.jpg 1 Irish_terrier 0.254856 True briard 0.227716 True soft-coated_wheaten_terrier 0.223263 True
1349 759566828574212096 https://pbs.twimg.com/media/CkNjahBXAAQ2kWo.jpg 1 Labrador_retriever 0.967397 True golden_retriever 0.016641 True ice_bear 0.014858 False
... ... ... ... ... ... ... ... ... ... ... ... ...
1903 851953902622658560 https://pbs.twimg.com/media/C4KHj-nWQAA3poV.jpg 1 Staffordshire_bullterrier 0.757547 True American_Staffordshire_terrier 0.149950 True Chesapeake_Bay_retriever 0.047523 True
1944 861769973181624320 https://pbs.twimg.com/media/CzG425nWgAAnP7P.jpg 2 Arabian_camel 0.366248 False house_finch 0.209852 False cocker_spaniel 0.046403 True
1992 873697596434513921 https://pbs.twimg.com/media/DA7iHL5U0AA1OQo.jpg 1 laptop 0.153718 False French_bulldog 0.099984 True printer 0.077130 False
2041 885311592912609280 https://pbs.twimg.com/media/C4bTH6nWMAAX_bJ.jpg 1 Labrador_retriever 0.908703 True seat_belt 0.057091 False pug 0.011933 True
2055 888202515573088257 https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg 2 Pembroke 0.809197 True Rhodesian_ridgeback 0.054950 True beagle 0.038915 True

66 rows × 12 columns

Tweepy Dataframe

In [17]:
tweepy_df
Out[17]:
created_at id id_str full_text truncated display_text_range entities extended_entities source in_reply_to_status_id ... favorited retweeted possibly_sensitive possibly_sensitive_appealable lang retweeted_status quoted_status_id quoted_status_id_str quoted_status_permalink quoted_status
0 2017-08-01 16:23:56+00:00 892420643555336193 892420643555336192 This is Phineas. He's a mystical boy. Only eve... False [0, 85] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 892420639486877696, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
1 2017-08-01 00:17:27+00:00 892177421306343426 892177421306343424 This is Tilly. She's just checking pup on you.... False [0, 138] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 892177413194625024, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2 2017-07-31 00:18:03+00:00 891815181378084864 891815181378084864 This is Archie. He is a rare Norwegian Pouncin... False [0, 121] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 891815175371796480, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
3 2017-07-30 15:58:51+00:00 891689557279858688 891689557279858688 This is Darla. She commenced a snooze mid meal... False [0, 79] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 891689552724799489, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
4 2017-07-29 16:00:24+00:00 891327558926688256 891327558926688256 This is Franklin. He would like you to stop ca... False [0, 138] {'hashtags': [{'text': 'BarkWeek', 'indices': ... {'media': [{'id': 891327551943041024, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2317 2015-11-16 00:24:50+00:00 666049248165822465 666049248165822464 Here we have a 1949 1st generation vulpix. Enj... False [0, 120] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 666049244999131136, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2318 2015-11-16 00:04:52+00:00 666044226329800704 666044226329800704 This is a purebred Piers Morgan. Loves to Netf... False [0, 137] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 666044217047650304, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2319 2015-11-15 23:21:54+00:00 666033412701032449 666033412701032448 Here is a very happy pup. Big fan of well-main... False [0, 130] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 666033409081393153, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2320 2015-11-15 23:05:30+00:00 666029285002620928 666029285002620928 This is a western brown Mitsubishi terrier. Up... False [0, 139] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 666029276303482880, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2321 2015-11-15 22:32:08+00:00 666020888022790149 666020888022790144 Here we have a Japanese Irish Setter. Lost eye... False [0, 131] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 666020881337073664, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN

2322 rows × 32 columns

In [18]:
tweepy_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2322 entries, 0 to 2321
Data columns (total 32 columns):
 #   Column                         Non-Null Count  Dtype              
---  ------                         --------------  -----              
 0   created_at                     2322 non-null   datetime64[ns, UTC]
 1   id                             2322 non-null   int64              
 2   id_str                         2322 non-null   int64              
 3   full_text                      2322 non-null   object             
 4   truncated                      2322 non-null   bool               
 5   display_text_range             2322 non-null   object             
 6   entities                       2322 non-null   object             
 7   extended_entities              2050 non-null   object             
 8   source                         2322 non-null   object             
 9   in_reply_to_status_id          76 non-null     float64            
 10  in_reply_to_status_id_str      76 non-null     float64            
 11  in_reply_to_user_id            76 non-null     float64            
 12  in_reply_to_user_id_str        76 non-null     float64            
 13  in_reply_to_screen_name        76 non-null     object             
 14  user                           2322 non-null   object             
 15  geo                            0 non-null      float64            
 16  coordinates                    0 non-null      float64            
 17  place                          1 non-null      object             
 18  contributors                   0 non-null      float64            
 19  is_quote_status                2322 non-null   bool               
 20  retweet_count                  2322 non-null   int64              
 21  favorite_count                 2322 non-null   int64              
 22  favorited                      2322 non-null   bool               
 23  retweeted                      2322 non-null   bool               
 24  possibly_sensitive             2187 non-null   float64            
 25  possibly_sensitive_appealable  2187 non-null   float64            
 26  lang                           2322 non-null   object             
 27  retweeted_status               162 non-null    object             
 28  quoted_status_id               26 non-null     float64            
 29  quoted_status_id_str           26 non-null     float64            
 30  quoted_status_permalink        26 non-null     object             
 31  quoted_status                  24 non-null     object             
dtypes: bool(4), datetime64[ns, UTC](1), float64(11), int64(4), object(12)
memory usage: 517.1+ KB
In [19]:
tweepy_df['retweeted_status'].value_counts()
Out[19]:
{'created_at': 'Sat Jul 15 02:44:07 +0000 2017', 'id': 886053734421102592, 'id_str': '886053734421102592', 'full_text': '12/10 #BATP https://t.co/WxwJmvjfxo', 'truncated': False, 'display_text_range': [0, 11], 'entities': {'hashtags': [{'text': 'BATP', 'indices': [6, 11]}], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/WxwJmvjfxo', 'expanded_url': 'https://twitter.com/dog_rates/status/886053434075471873', 'display_url': 'twitter.com/dog_rates/stat…', 'indices': [12, 35]}]}, 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 19607400, 'id_str': '19607400', 'name': 'Oakland A's', 'screen_name': 'Athletics', 'location': 'Oakland, CA', 'description': 'Official Twitter of the nine-time World Series champion Athletics | #RootedInOakland | Instagram: @athletics | Snapchat: athletics', 'url': 'https://t.co/r4DoRNY1zr', 'entities': {'url': {'urls': [{'url': 'https://t.co/r4DoRNY1zr', 'expanded_url': 'http://www.athletics.com', 'display_url': 'athletics.com', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 565555, 'friends_count': 542, 'listed_count': 5162, 'created_at': 'Tue Jan 27 18:40:21 +0000 2009', 'favourites_count': 27445, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 57978, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': 'FCB514', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1286704475059531777/dGrbr0eo_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1286704475059531777/dGrbr0eo_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/19607400/1595792133', 'profile_link_color': '2B463A', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '7BD193', 'profile_text_color': '333333', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': True, 'quoted_status_id': 886053434075471873, 'quoted_status_id_str': '886053434075471873', 'quoted_status_permalink': {'url': 'https://t.co/WxwJmvjfxo', 'expanded': 'https://twitter.com/dog_rates/status/886053434075471873', 'display': 'twitter.com/dog_rates/stat…'}, 'quoted_status': {'created_at': 'Sat Jul 15 02:42:55 +0000 2017', 'id': 886053434075471873, 'id_str': '886053434075471873', 'full_text': 'Our snapchat story is h*ckin ridiculous right now. The @Athletics really know how to host a Bark at the Park
https://t.co/gJx2GpMSyY https://t.co/6d2N0ctyC1', 'truncated': False, 'display_text_range': [0, 132], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'Athletics', 'name': "Oakland A's", 'id': 19607400, 'id_str': '19607400', 'indices': [55, 65]}], 'urls': [{'url': 'https://t.co/gJx2GpMSyY', 'expanded_url': 'https://www.snapchat.com/add/weratedogs', 'display_url': 'snapchat.com/add/weratedogs', 'indices': [109, 132]}], 'media': [{'id': 886053427184254976, 'id_str': '886053427184254976', 'indices': [133, 156], 'media_url': 'http://pbs.twimg.com/media/DEvk5cNVwAAcISQ.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DEvk5cNVwAAcISQ.jpg', 'url': 'https://t.co/6d2N0ctyC1', 'display_url': 'pic.twitter.com/6d2N0ctyC1', 'expanded_url': 'https://twitter.com/dog_rates/status/886053434075471873/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 750, 'h': 1334, 'resize': 'fit'}, 'small': {'w': 382, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 675, 'h': 1200, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 886053427184254976, 'id_str': '886053427184254976', 'indices': [133, 156], 'media_url': 'http://pbs.twimg.com/media/DEvk5cNVwAAcISQ.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DEvk5cNVwAAcISQ.jpg', 'url': 'https://t.co/6d2N0ctyC1', 'display_url': 'pic.twitter.com/6d2N0ctyC1', 'expanded_url': 'https://twitter.com/dog_rates/status/886053434075471873/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 750, 'h': 1334, 'resize': 'fit'}, 'small': {'w': 382, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 675, 'h': 1200, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815727, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 190, 'favorite_count': 3064, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}, 'retweet_count': 100, 'favorite_count': 1442, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'und'}    1
{'created_at': 'Sat May 28 03:04:00 +0000 2016', 'id': 736392552031657984, 'id_str': '736392552031657984', 'full_text': 'Say hello to mad pupper. You know what you did. 13/10 would pet until no longer furustrated https://t.co/u1ulQ5heLX', 'truncated': False, 'display_text_range': [0, 115], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/u1ulQ5heLX', 'expanded_url': 'https://vine.co/v/iEggaEOiLO3', 'display_url': 'vine.co/v/iEggaEOiLO3', 'indices': [92, 115]}]}, 'source': '<a href="http://vine.co" rel="nofollow">Vine - Make a Scene</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815742, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 7251, 'favorite_count': 17450, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               1
{'created_at': 'Tue Sep 13 16:30:07 +0000 2016', 'id': 775733305207554048, 'id_str': '775733305207554048', 'full_text': 'This is Anakin. He strives to reach his full doggo potential. Born with blurry tail tho. 11/10 would still pet well https://t.co/9CcBSxCXXG', 'truncated': False, 'display_text_range': [0, 115], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 775733297511067649, 'id_str': '775733297511067649', 'indices': [116, 139], 'media_url': 'http://pbs.twimg.com/media/CsP1UvaW8AExVSA.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CsP1UvaW8AExVSA.jpg', 'url': 'https://t.co/9CcBSxCXXG', 'display_url': 'pic.twitter.com/9CcBSxCXXG', 'expanded_url': 'https://twitter.com/dog_rates/status/775733305207554048/photo/1', 'type': 'photo', 'sizes': {'large': {'w': 600, 'h': 600, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 600, 'h': 600, 'resize': 'fit'}, 'small': {'w': 600, 'h': 600, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 775733297511067649, 'id_str': '775733297511067649', 'indices': [116, 139], 'media_url': 'http://pbs.twimg.com/media/CsP1UvaW8AExVSA.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CsP1UvaW8AExVSA.jpg', 'url': 'https://t.co/9CcBSxCXXG', 'display_url': 'pic.twitter.com/9CcBSxCXXG', 'expanded_url': 'https://twitter.com/dog_rates/status/775733305207554048/photo/1', 'type': 'photo', 'sizes': {'large': {'w': 600, 'h': 600, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 600, 'h': 600, 'resize': 'fit'}, 'small': {'w': 600, 'h': 600, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815740, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 3998, 'favorite_count': 13921, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        1
{'created_at': 'Thu Nov 19 00:32:12 +0000 2015', 'id': 667138269671505920, 'id_str': '667138269671505920', 'full_text': 'Extremely intelligent dog here. Has learned to walk like human. Even has his own dog. Very impressive 10/10 https://t.co/0DvHAMdA4V', 'truncated': False, 'display_text_range': [0, 131], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 667138263048585216, 'id_str': '667138263048585216', 'indices': [108, 131], 'media_url': 'http://pbs.twimg.com/media/CUImtzEVAAAZNJo.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CUImtzEVAAAZNJo.jpg', 'url': 'https://t.co/0DvHAMdA4V', 'display_url': 'pic.twitter.com/0DvHAMdA4V', 'expanded_url': 'https://twitter.com/dog_rates/status/667138269671505920/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1024, 'h': 862, 'resize': 'fit'}, 'small': {'w': 680, 'h': 572, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 862, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 667138263048585216, 'id_str': '667138263048585216', 'indices': [108, 131], 'media_url': 'http://pbs.twimg.com/media/CUImtzEVAAAZNJo.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CUImtzEVAAAZNJo.jpg', 'url': 'https://t.co/0DvHAMdA4V', 'display_url': 'pic.twitter.com/0DvHAMdA4V', 'expanded_url': 'https://twitter.com/dog_rates/status/667138269671505920/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1024, 'h': 862, 'resize': 'fit'}, 'small': {'w': 680, 'h': 572, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 862, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815745, 'friends_count': 17, 'listed_count': 5696, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 2044, 'favorite_count': 4332, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             1
{'created_at': 'Sat Dec 17 00:38:52 +0000 2016', 'id': 809920764300447744, 'id_str': '809920764300447744', 'full_text': 'Please only send in dogs. We only rate dogs, not seemingly heartbroken ewoks. Thank you... still 10/10 would console https://t.co/HIraYS1Bzo', 'truncated': False, 'display_text_range': [0, 116], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 809920757623115780, 'id_str': '809920757623115780', 'indices': [117, 140], 'media_url': 'http://pbs.twimg.com/media/Cz1qo05XUAQ4qXp.jpg', 'media_url_https': 'https://pbs.twimg.com/media/Cz1qo05XUAQ4qXp.jpg', 'url': 'https://t.co/HIraYS1Bzo', 'display_url': 'pic.twitter.com/HIraYS1Bzo', 'expanded_url': 'https://twitter.com/dog_rates/status/809920764300447744/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 491, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 867, 'h': 1200, 'resize': 'fit'}, 'large': {'w': 1149, 'h': 1590, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 809920757623115780, 'id_str': '809920757623115780', 'indices': [117, 140], 'media_url': 'http://pbs.twimg.com/media/Cz1qo05XUAQ4qXp.jpg', 'media_url_https': 'https://pbs.twimg.com/media/Cz1qo05XUAQ4qXp.jpg', 'url': 'https://t.co/HIraYS1Bzo', 'display_url': 'pic.twitter.com/HIraYS1Bzo', 'expanded_url': 'https://twitter.com/dog_rates/status/809920764300447744/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 491, 'h': 680, 'resize': 'fit'}, 'medium': {'w': 867, 'h': 1200, 'resize': 'fit'}, 'large': {'w': 1149, 'h': 1590, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815731, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 3982, 'favorite_count': 15699, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ..
{'created_at': 'Sun Feb 19 01:23:00 +0000 2017', 'id': 833124694597443584, 'id_str': '833124694597443584', 'full_text': 'This is Gidget. She's a spy pupper. Stealthy as h*ck. Must've slipped pup and got caught. 12/10 would forgive then pet https://t.co/zD97KYFaFa', 'truncated': False, 'display_text_range': [0, 118], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 833124662091542528, 'id_str': '833124662091542528', 'indices': [119, 142], 'media_url': 'http://pbs.twimg.com/media/C4_ad1GVcAAgvx6.jpg', 'media_url_https': 'https://pbs.twimg.com/media/C4_ad1GVcAAgvx6.jpg', 'url': 'https://t.co/zD97KYFaFa', 'display_url': 'pic.twitter.com/zD97KYFaFa', 'expanded_url': 'https://twitter.com/dog_rates/status/833124694597443584/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 675, 'h': 1200, 'resize': 'fit'}, 'small': {'w': 383, 'h': 680, 'resize': 'fit'}, 'large': {'w': 1152, 'h': 2048, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 833124662091542528, 'id_str': '833124662091542528', 'indices': [119, 142], 'media_url': 'http://pbs.twimg.com/media/C4_ad1GVcAAgvx6.jpg', 'media_url_https': 'https://pbs.twimg.com/media/C4_ad1GVcAAgvx6.jpg', 'url': 'https://t.co/zD97KYFaFa', 'display_url': 'pic.twitter.com/zD97KYFaFa', 'expanded_url': 'https://twitter.com/dog_rates/status/833124694597443584/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 675, 'h': 1200, 'resize': 'fit'}, 'small': {'w': 383, 'h': 680, 'resize': 'fit'}, 'large': {'w': 1152, 'h': 2048, 'resize': 'fit'}}}, {'id': 833124662095679488, 'id_str': '833124662095679488', 'indices': [119, 142], 'media_url': 'http://pbs.twimg.com/media/C4_ad1HUkAAWbJp.jpg', 'media_url_https': 'https://pbs.twimg.com/media/C4_ad1HUkAAWbJp.jpg', 'url': 'https://t.co/zD97KYFaFa', 'display_url': 'pic.twitter.com/zD97KYFaFa', 'expanded_url': 'https://twitter.com/dog_rates/status/833124694597443584/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 675, 'h': 1200, 'resize': 'fit'}, 'small': {'w': 383, 'h': 680, 'resize': 'fit'}, 'large': {'w': 1152, 'h': 2048, 'resize': 'fit'}}}, {'id': 833124662099877889, 'id_str': '833124662099877889', 'indices': [119, 142], 'media_url': 'http://pbs.twimg.com/media/C4_ad1IUoAEspsk.jpg', 'media_url_https': 'https://pbs.twimg.com/media/C4_ad1IUoAEspsk.jpg', 'url': 'https://t.co/zD97KYFaFa', 'display_url': 'pic.twitter.com/zD97KYFaFa', 'expanded_url': 'https://twitter.com/dog_rates/status/833124694597443584/photo/1', 'type': 'photo', 'sizes': {'large': {'w': 1150, 'h': 2048, 'resize': 'fit'}, 'small': {'w': 382, 'h': 680, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 674, 'h': 1200, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815731, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 4802, 'favorite_count': 20111, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         1
{'created_at': 'Wed Dec 16 01:27:03 +0000 2015', 'id': 676936541936185344, 'id_str': '676936541936185344', 'full_text': 'Here we see a rare pouched pupper. Ample storage space. Looks alert. Jumps at random. Kicked open that door. 8/10 https://t.co/mqvaxleHRz', 'truncated': False, 'display_text_range': [0, 137], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 676936535535656961, 'id_str': '676936535535656961', 'indices': [114, 137], 'media_url': 'http://pbs.twimg.com/media/CWT2MUgWIAECWig.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CWT2MUgWIAECWig.jpg', 'url': 'https://t.co/mqvaxleHRz', 'display_url': 'pic.twitter.com/mqvaxleHRz', 'expanded_url': 'https://twitter.com/dog_rates/status/676936541936185344/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 510, 'h': 680, 'resize': 'fit'}, 'large': {'w': 768, 'h': 1024, 'resize': 'fit'}, 'medium': {'w': 768, 'h': 1024, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 676936535535656961, 'id_str': '676936535535656961', 'indices': [114, 137], 'media_url': 'http://pbs.twimg.com/media/CWT2MUgWIAECWig.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CWT2MUgWIAECWig.jpg', 'url': 'https://t.co/mqvaxleHRz', 'display_url': 'pic.twitter.com/mqvaxleHRz', 'expanded_url': 'https://twitter.com/dog_rates/status/676936541936185344/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 510, 'h': 680, 'resize': 'fit'}, 'large': {'w': 768, 'h': 1024, 'resize': 'fit'}, 'medium': {'w': 768, 'h': 1024, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815740, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 4787, 'favorite_count': 12377, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      1
{'created_at': 'Sun Nov 20 00:59:15 +0000 2016', 'id': 800141422401830912, 'id_str': '800141422401830912', 'full_text': 'This is Peaches. She's the ultimate selfie sidekick. Super sneaky tongue slip appreciated. 13/10 https://t.co/pbKOesr8Tg', 'truncated': False, 'display_text_range': [0, 96], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 800141411257643009, 'id_str': '800141411257643009', 'indices': [97, 120], 'media_url': 'http://pbs.twimg.com/media/CxqsX8wXcAEnc3u.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CxqsX8wXcAEnc3u.jpg', 'url': 'https://t.co/pbKOesr8Tg', 'display_url': 'pic.twitter.com/pbKOesr8Tg', 'expanded_url': 'https://twitter.com/dog_rates/status/800141422401830912/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 800141411257643009, 'id_str': '800141411257643009', 'indices': [97, 120], 'media_url': 'http://pbs.twimg.com/media/CxqsX8wXcAEnc3u.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CxqsX8wXcAEnc3u.jpg', 'url': 'https://t.co/pbKOesr8Tg', 'display_url': 'pic.twitter.com/pbKOesr8Tg', 'expanded_url': 'https://twitter.com/dog_rates/status/800141422401830912/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}}}, {'id': 800141411266007041, 'id_str': '800141411266007041', 'indices': [97, 120], 'media_url': 'http://pbs.twimg.com/media/CxqsX8yXEAEkgUe.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CxqsX8yXEAEkgUe.jpg', 'url': 'https://t.co/pbKOesr8Tg', 'display_url': 'pic.twitter.com/pbKOesr8Tg', 'expanded_url': 'https://twitter.com/dog_rates/status/800141422401830912/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}}}, {'id': 800141411844837376, 'id_str': '800141411844837376', 'indices': [97, 120], 'media_url': 'http://pbs.twimg.com/media/CxqsX-8XUAAEvjD.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CxqsX-8XUAAEvjD.jpg', 'url': 'https://t.co/pbKOesr8Tg', 'display_url': 'pic.twitter.com/pbKOesr8Tg', 'expanded_url': 'https://twitter.com/dog_rates/status/800141422401830912/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815734, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 2573, 'favorite_count': 15455, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        1
{'created_at': 'Tue Jul 05 20:41:01 +0000 2016', 'id': 750429297815552001, 'id_str': '750429297815552001', 'full_text': 'This is Arnie. He's a Nova Scotian Fridge Floof. Rare af. 12/10 https://t.co/lprdOylVpS', 'truncated': False, 'display_text_range': [0, 63], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 750429289032642560, 'id_str': '750429289032642560', 'indices': [64, 87], 'media_url': 'http://pbs.twimg.com/media/CmoPdmHW8AAi8BI.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CmoPdmHW8AAi8BI.jpg', 'url': 'https://t.co/lprdOylVpS', 'display_url': 'pic.twitter.com/lprdOylVpS', 'expanded_url': 'https://twitter.com/dog_rates/status/750429297815552001/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}}}]}, 'extended_entities': {'media': [{'id': 750429289032642560, 'id_str': '750429289032642560', 'indices': [64, 87], 'media_url': 'http://pbs.twimg.com/media/CmoPdmHW8AAi8BI.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CmoPdmHW8AAi8BI.jpg', 'url': 'https://t.co/lprdOylVpS', 'display_url': 'pic.twitter.com/lprdOylVpS', 'expanded_url': 'https://twitter.com/dog_rates/status/750429297815552001/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 1024, 'h': 768, 'resize': 'fit'}, 'small': {'w': 680, 'h': 510, 'resize': 'fit'}, 'large': {'w': 1024, 'h': 768, 'resize': 'fit'}}}, {'id': 750429288596373504, 'id_str': '750429288596373504', 'indices': [64, 87], 'media_url': 'http://pbs.twimg.com/media/CmoPdkfWAAAagwY.jpg', 'media_url_https': 'https://pbs.twimg.com/media/CmoPdkfWAAAagwY.jpg', 'url': 'https://t.co/lprdOylVpS', 'display_url': 'pic.twitter.com/lprdOylVpS', 'expanded_url': 'https://twitter.com/dog_rates/status/750429297815552001/photo/1', 'type': 'photo', 'sizes': {'small': {'w': 510, 'h': 680, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 768, 'h': 1024, 'resize': 'fit'}, 'medium': {'w': 768, 'h': 1024, 'resize': 'fit'}}}]}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815743, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 4238, 'favorite_count': 13094, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     1
{'created_at': 'Wed Jan 06 20:16:44 +0000 2016', 'id': 684830982659280897, 'id_str': '684830982659280897', 'full_text': 'This little fella really hates stairs. Prefers bush. 13/10 legendary pupper https://t.co/e3LPMAHj7p', 'truncated': False, 'display_text_range': [0, 99], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/e3LPMAHj7p', 'expanded_url': 'https://vine.co/v/eEZXZI1rqxX', 'display_url': 'vine.co/v/eEZXZI1rqxX', 'indices': [76, 99]}]}, 'source': '<a href="http://vine.co" rel="nofollow">Vine - Make a Scene</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4196983835, 'id_str': '4196983835', 'name': 'WeRateDogs®', 'screen_name': 'dog_rates', 'location': '「 DM YOUR DOGS 」', 'description': 'Your Only Source For Professional Dog Ratings Instagram and Facebook ➪ WeRateDogs partnerships@weratedogs.com ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀', 'url': 'https://t.co/Wrvtpnv7JV', 'entities': {'url': {'urls': [{'url': 'https://t.co/Wrvtpnv7JV', 'expanded_url': 'https://blacklivesmatters.carrd.co', 'display_url': 'blacklivesmatters.carrd.co', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 8815741, 'friends_count': 17, 'listed_count': 5695, 'created_at': 'Sun Nov 15 21:41:29 +0000 2015', 'favourites_count': 145866, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 12552, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1267972589722296320/XBr04M6J_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1591077312', 'profile_link_color': 'F5ABB5', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'none'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 21334, 'favorite_count': 34569, 'favorited': False, 'retweeted': False, 'possibly_sensitive': False, 'possibly_sensitive_appealable': False, 'lang': 'en'}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                1
Name: retweeted_status, Length: 162, dtype: int64

From the assessment process above, the result is divide into two kinds, quality and tidiness issues.

Quality

Quality: issues with content. Low-quality data is also known as dirty data.

archive dataframe:

  • keep the original tweet except the retweeted
  • some not useful columns for analysis i.e: in_reply_to_status_id, in_reply_to_user_id, source, expanded_urls, retweeted_status_id, retweeted_status_user_id, and retweeted_status_timestamp
  • tweet_id in int64 Dtype
  • timestamp in object Dtype
  • wrong numerator (decimal value or false detection) in index 516, 1712, 1202, and 763
  • wrong denominator in index 2335, 342, and 516
  • 'None' value instead of NaN in name and dog stages colummn
  • dog_stage columns: doggo, floofer, pupper, and puppo is not good

image dataframe:

  • duplicated image
  • tweet_id in int64 Dtype
  • not columns for analysis for analysis

tweepy dataframe:

  • non original tweet
  • id column name is not match with other dataframe
  • id in int64 Dtype
  • not useful columns for analysis i.e (id_str, in_reply_to_status_id, in_reply_to_status_id_str, in_reply_to_user_id, in_reply_to_user_id_str, lang, quoted_status_id, and quoted_status_id_str

Tidiness

Tidiness: issues with a structure that prevents easy analysis. Untidy data is also known as messy data.

archive dataframe

  • some dogs have multiple stages

image dataframe:

  • p1, p1_conf, p1_dog, p2, p2_conf, p2_dog, p3, p3_conf, p3_dog

tweepy dataframe:

-

make all dataframes into one whole master dataframe

Cleaning Data

The programmatic data cleaning process:

  • Define
  • Code
  • Test

As always, we need to copy our dataframe before do any cleaning process, so we can refer back to the old ones.

Archive Dataframe

What we will do for this dataframe are:

  • remove retweeted row with filtering technique
  • remove not useful for analysis columns using .drop() method
  • change tweet_id datatype into 'object' using .astype() method
  • change timestamp datatype into datetime using .astype() method
  • with some looping we will fix
    • numerator for index 516, 1712, 1202, and 763
    • wrong denominator for index 2335, 342, and 516
  • change 'None' into NaN in name and dog stages colummn using numpy
  • make dog_stage column, then delete the messy columns
In [20]:
# Prepare, copy the original dataframe
archive_df_clean = archive_df.copy()
In [21]:
# Define: Remove not useful for analysis columns

# Code
list = ['in_reply_to_status_id',
        'in_reply_to_user_id', 'source', 'expanded_urls']
archive_df_clean.drop(list, axis=1, inplace=True)

# Test
archive_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 13 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   tweet_id                    2356 non-null   int64  
 1   timestamp                   2356 non-null   object 
 2   text                        2356 non-null   object 
 3   retweeted_status_id         181 non-null    float64
 4   retweeted_status_user_id    181 non-null    float64
 5   retweeted_status_timestamp  181 non-null    object 
 6   rating_numerator            2356 non-null   int64  
 7   rating_denominator          2356 non-null   int64  
 8   name                        2356 non-null   object 
 9   doggo                       2356 non-null   object 
 10  floofer                     2356 non-null   object 
 11  pupper                      2356 non-null   object 
 12  puppo                       2356 non-null   object 
dtypes: float64(2), int64(3), object(8)
memory usage: 239.4+ KB

Keep the original tweet

Based on .info() there is 181 row that which is not original tweet

In [22]:
# Define: Select only the row that has null value in retweeted_status_id column

# Code
archive_df_clean = archive_df_clean[archive_df_clean['retweeted_status_id'].isnull()]

# Test
archive_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2175 entries, 0 to 2355
Data columns (total 13 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   tweet_id                    2175 non-null   int64  
 1   timestamp                   2175 non-null   object 
 2   text                        2175 non-null   object 
 3   retweeted_status_id         0 non-null      float64
 4   retweeted_status_user_id    0 non-null      float64
 5   retweeted_status_timestamp  0 non-null      object 
 6   rating_numerator            2175 non-null   int64  
 7   rating_denominator          2175 non-null   int64  
 8   name                        2175 non-null   object 
 9   doggo                       2175 non-null   object 
 10  floofer                     2175 non-null   object 
 11  pupper                      2175 non-null   object 
 12  puppo                       2175 non-null   object 
dtypes: float64(2), int64(3), object(8)
memory usage: 237.9+ KB
In [23]:
# Define: Remove not useful for analysis columns

# Code
list = ['retweeted_status_id', 'retweeted_status_user_id',
        'retweeted_status_timestamp']
archive_df_clean.drop(list, axis=1, inplace=True)

# Test
archive_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2175 entries, 0 to 2355
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   tweet_id            2175 non-null   int64 
 1   timestamp           2175 non-null   object
 2   text                2175 non-null   object
 3   rating_numerator    2175 non-null   int64 
 4   rating_denominator  2175 non-null   int64 
 5   name                2175 non-null   object
 6   doggo               2175 non-null   object
 7   floofer             2175 non-null   object
 8   pupper              2175 non-null   object
 9   puppo               2175 non-null   object
dtypes: int64(3), object(7)
memory usage: 186.9+ KB

Fix columns dtype (tweet_id, timestamp)

In [24]:
# Define: Fix the wrong dtype using .astype

# Code
dict = {'tweet_id': 'object', 'timestamp': 'datetime64[ns]'}
archive_df_clean = archive_df_clean.astype(dict)

# Test
archive_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2175 entries, 0 to 2355
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   tweet_id            2175 non-null   object        
 1   timestamp           2175 non-null   datetime64[ns]
 2   text                2175 non-null   object        
 3   rating_numerator    2175 non-null   int64         
 4   rating_denominator  2175 non-null   int64         
 5   name                2175 non-null   object        
 6   doggo               2175 non-null   object        
 7   floofer             2175 non-null   object        
 8   pupper              2175 non-null   object        
 9   puppo               2175 non-null   object        
dtypes: datetime64[ns](1), int64(2), object(7)
memory usage: 186.9+ KB

Fix wrong numerator and denominator

If we match the numerator and denominator column value with text column, there is some mismatch like wrong detection or not detected decimal value.

Wrong detection
In [25]:
wrong_detection_index = [516, 1202, 2335, 342]
for s in wrong_detection_index:
    print(s, "\t", archive_df['text'][s],
          "\t", archive_df_clean['rating_numerator'][s],
          "\t", archive_df_clean['rating_denominator'][s])
516 	 Meet Sam. She smiles 24/7 &amp; secretly aspires to be a reindeer. 
Keep Sam smiling by clicking and sharing this link:
https://t.co/98tB8y7y7t https://t.co/LouL5vdvxx 	 24 	 7
1202 	 This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq 	 50 	 50
2335 	 This is an Albanian 3 1/2 legged  Episcopalian. Loves well-polished hardwood flooring. Penis on the collar. 9/10 https://t.co/d9NcXFKwLv 	 1 	 2
342 	 @docmisterio account started on 11/15/15 	 11 	 15
In [26]:
# Define: Since this was wrong detection, we do manual update for each occasion
# Index 516, change num and denum to NaN

# Code
archive_df_clean.loc[516, 'rating_numerator'] = np.NaN
archive_df_clean.loc[516, 'rating_denominator'] = np.NaN
In [27]:
# Define: Index 1202, change num to 11 and denum to 10

# Code
archive_df_clean.loc[1202, 'rating_numerator'] = 11
archive_df_clean.loc[1202, 'rating_denominator'] = 10
In [28]:
# Define: Index 2335, change num to 9 and denum to 10

# Code
archive_df_clean.loc[2335, 'rating_numerator'] = 9
archive_df_clean.loc[2335, 'rating_denominator'] = 10
In [29]:
# Define Index 342, change num  and denum to NaN

# Code
archive_df_clean.loc[342, 'rating_numerator'] = np.NaN
archive_df_clean.loc[342, 'rating_denominator'] = np.NaN
In [30]:
# Test all above
wrong_detection_index = [516, 1202, 2335, 342]
for s in wrong_detection_index:
    print(s, "\t", archive_df['text'][s],
          "\t", archive_df_clean['rating_numerator'][s],
          "\t", archive_df_clean['rating_denominator'][s])
516 	 Meet Sam. She smiles 24/7 &amp; secretly aspires to be a reindeer. 
Keep Sam smiling by clicking and sharing this link:
https://t.co/98tB8y7y7t https://t.co/LouL5vdvxx 	 nan 	 nan
1202 	 This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq 	 11.0 	 10.0
2335 	 This is an Albanian 3 1/2 legged  Episcopalian. Loves well-polished hardwood flooring. Penis on the collar. 9/10 https://t.co/d9NcXFKwLv 	 9.0 	 10.0
342 	 @docmisterio account started on 11/15/15 	 nan 	 nan
Decimal value

The decimal numerator is like in index 1712 and 763. Then we have to suspect something else like this, so we do a re-assessment data.

In [31]:
decimal_detection_index = [763, 1712]
for s in decimal_detection_index:
    print(s, "\t", archive_df['text'][s],
          "\t", archive_df_clean['rating_numerator'][s],
          "\t", archive_df_clean['rating_denominator'][s])
763 	 This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq 	 27.0 	 10.0
1712 	 Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD 	 26.0 	 10.0
In [32]:
# Check all decimal occasion
for s in archive_df_clean.index.to_list():
    text = archive_df_clean['text'][s]
    regexp = re.compile(r'(\d+\.\d*\/\d+)')
    if regexp.search(text):
        print(s, "\t", archive_df['text'][s],
          "\t", archive_df_clean['rating_numerator'][s],
          "\t", archive_df_clean['rating_denominator'][s])
45 	 This is Bella. She hopes her smile made you smile. If not, she is also offering you her favorite monkey. 13.5/10 https://t.co/qjrljjt948 	 5.0 	 10.0
695 	 This is Logan, the Chow who lived. He solemnly swears he's up to lots of good. H*ckin magical af 9.75/10 https://t.co/yBO5wuqaPS 	 75.0 	 10.0
763 	 This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq 	 27.0 	 10.0
1689 	 I've been told there's a slight possibility he's checking his mirror. We'll bump to 9.5/10. Still a menace 	 5.0 	 10.0
1712 	 Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD 	 26.0 	 10.0
In [33]:
# Define: Fix decimal nominator and denominator

# Code
rating = archive_df_clean.text.str.extract('((?:\d+\.)?\d+)\/(\d+)', expand=True)
rating.columns = ['rating_numerator', 'rating_denominator']
archive_df_clean['rating_numerator'] = rating['rating_numerator'].astype(float)
archive_df_clean['rating_denominator'] = rating['rating_denominator'].astype(float)

# Test
for s in archive_df_clean.index.to_list():
    text = archive_df_clean['text'][s]
    regexp = re.compile(r'(\d+\.\d*\/\d+)')
    if regexp.search(text):
        print(s, "\t", archive_df['text'][s],
          "\t", archive_df_clean['rating_numerator'][s],
          "\t", archive_df_clean['rating_denominator'][s])
45 	 This is Bella. She hopes her smile made you smile. If not, she is also offering you her favorite monkey. 13.5/10 https://t.co/qjrljjt948 	 13.5 	 10.0
695 	 This is Logan, the Chow who lived. He solemnly swears he's up to lots of good. H*ckin magical af 9.75/10 https://t.co/yBO5wuqaPS 	 9.75 	 10.0
763 	 This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq 	 11.27 	 10.0
1689 	 I've been told there's a slight possibility he's checking his mirror. We'll bump to 9.5/10. Still a menace 	 9.5 	 10.0
1712 	 Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD 	 11.26 	 10.0

Change None value in name columns to NaN

In [34]:
# Define: Change None -> NaN

# Code
archive_df_clean['name'] = archive_df_clean['name'].replace('None', np.NaN)

# Test
archive_df_clean.name.sample(10)
Out[34]:
954        Fred
1159      Sarge
1500      Edgar
2016    Bradley
38         Earl
950       Brody
1053        NaN
1548      Lucky
2185       Ruby
1542        NaN
Name: name, dtype: object

Dog_stage columns

In [35]:
# Before do anly cleaning, we need to change None value to 0

# Define: Change None value to 0

# Code
col = ['doggo', 'floofer', 'pupper', 'puppo']

for c in col:
    archive_df_clean[col] = archive_df_clean[col].replace('None', 0)
    
# Test
archive_df_clean[col].sample(10)
Out[35]:
doggo floofer pupper puppo
2058 0 0 0 0
330 0 0 pupper 0
505 0 0 0 0
31 0 0 0 0
2240 0 0 0 0
1937 0 0 pupper 0
2085 0 0 0 0
1827 0 0 0 0
2250 0 0 0 0
771 0 0 0 0
In [36]:
#Define: We will make dog stage columns into one concise column

# Code
dog_stage = []

for idx, col in archive_df_clean.iterrows():
    doggo = col[-4]
    floofer = col[-3]
    pupper = col[-2]
    puppo = col[-1]
    
    if int(bool(doggo)) + int(bool(floofer)) + int(bool(pupper)) + int(bool(puppo)) == 1:
        if doggo:
            dog_stage.append(doggo)
        elif floofer:
            dog_stage.append(floofer)
        elif pupper:
            dog_stage.append(pupper)
        elif puppo:
            dog_stage.append(puppo)
    elif int(bool(doggo)) + int(bool(floofer)) + int(bool(pupper)) + int(bool(puppo)) > 1:
        dog_stage.append('multiple_stages')
    else:
        dog_stage.append(np.NaN)

# Make new column for image dataframe
archive_df_clean['dog_stage'] = dog_stage

# Test
archive_df_clean['dog_stage'].sample(10)
Out[36]:
1952       NaN
87         NaN
1345       NaN
1571    pupper
205        NaN
1148       NaN
1958       NaN
2230       NaN
2254       NaN
893        NaN
Name: dog_stage, dtype: object
In [37]:
archive_df_clean['dog_stage'].value_counts()
Out[37]:
pupper             224
doggo               75
puppo               24
multiple_stages     12
floofer              9
Name: dog_stage, dtype: int64

Since we found 12 rows with multiple_stages, we need to examine further. It's possible caused by more than one dogs in the post or wrong auto-detection.

Multiple dog stages

In [38]:
archive_df_clean[archive_df_clean['dog_stage'] == 'multiple_stages']
Out[38]:
tweet_id timestamp text rating_numerator rating_denominator name doggo floofer pupper puppo dog_stage
191 855851453814013952 2017-04-22 18:31:02 Here's a puppo participating in the #ScienceMa... 13.0 10.0 NaN doggo 0 0 puppo multiple_stages
200 854010172552949760 2017-04-17 16:34:26 At first I thought this was a shy doggo, but i... 11.0 10.0 NaN doggo floofer 0 0 multiple_stages
460 817777686764523521 2017-01-07 16:59:28 This is Dido. She's playing the lead role in "... 13.0 10.0 Dido doggo 0 pupper 0 multiple_stages
531 808106460588765185 2016-12-12 00:29:28 Here we have Burke (pupper) and Dexter (doggo)... 12.0 10.0 NaN doggo 0 pupper 0 multiple_stages
565 802265048156610565 2016-11-25 21:37:47 Like doggo, like pupper version 2. Both 11/10 ... 11.0 10.0 NaN doggo 0 pupper 0 multiple_stages
575 801115127852503040 2016-11-22 17:28:25 This is Bones. He's being haunted by another d... 12.0 10.0 Bones doggo 0 pupper 0 multiple_stages
705 785639753186217984 2016-10-11 00:34:48 This is Pinot. He's a sophisticated doggo. You... 10.0 10.0 Pinot doggo 0 pupper 0 multiple_stages
733 781308096455073793 2016-09-29 01:42:20 Pupper butt 1, Doggo 0. Both 12/10 https://t.c... 12.0 10.0 NaN doggo 0 pupper 0 multiple_stages
889 759793422261743616 2016-07-31 16:50:42 Meet Maggie &amp; Lila. Maggie is the doggo, L... 12.0 10.0 Maggie doggo 0 pupper 0 multiple_stages
956 751583847268179968 2016-07-09 01:08:47 Please stop sending it pictures that don't eve... 5.0 10.0 NaN doggo 0 pupper 0 multiple_stages
1063 741067306818797568 2016-06-10 00:39:48 This is just downright precious af. 12/10 for ... 12.0 10.0 just doggo 0 pupper 0 multiple_stages
1113 733109485275860992 2016-05-19 01:38:16 Like father (doggo), like son (pupper). Both 1... 12.0 10.0 NaN doggo 0 pupper 0 multiple_stages
In [39]:
# Visually check the stages from the text post
multiple_stages_index = archive_df_clean[archive_df_clean['dog_stage'] == 'multiple_stages'].index.to_list()
for s in multiple_stages_index:
    print(s, "\t", archive_df_clean['text'][s],
          "\n", archive_df_clean['dog_stage'][s])
191 	 Here's a puppo participating in the #ScienceMarch. Cleverly disguising her own doggo agenda. 13/10 would keep the planet habitable for https://t.co/cMhq16isel 
 multiple_stages
200 	 At first I thought this was a shy doggo, but it's actually a Rare Canadian Floofer Owl. Amateurs would confuse the two. 11/10 only send dogs https://t.co/TXdT3tmuYk 
 multiple_stages
460 	 This is Dido. She's playing the lead role in "Pupper Stops to Catch Snow Before Resuming Shadow Box with Dried Apple." 13/10 (IG: didodoggo) https://t.co/m7isZrOBX7 
 multiple_stages
531 	 Here we have Burke (pupper) and Dexter (doggo). Pupper wants to be exactly like doggo. Both 12/10 would pet at same time https://t.co/ANBpEYHaho 
 multiple_stages
565 	 Like doggo, like pupper version 2. Both 11/10 https://t.co/9IxWAXFqze 
 multiple_stages
575 	 This is Bones. He's being haunted by another doggo of roughly the same size. 12/10 deep breaths pupper everything's fine https://t.co/55Dqe0SJNj 
 multiple_stages
705 	 This is Pinot. He's a sophisticated doggo. You can tell by the hat. Also pointier than your average pupper. Still 10/10 would pet cautiously https://t.co/f2wmLZTPHd 
 multiple_stages
733 	 Pupper butt 1, Doggo 0. Both 12/10 https://t.co/WQvcPEpH2u 
 multiple_stages
889 	 Meet Maggie &amp; Lila. Maggie is the doggo, Lila is the pupper. They are sisters. Both 12/10 would pet at the same time https://t.co/MYwR4DQKll 
 multiple_stages
956 	 Please stop sending it pictures that don't even have a doggo or pupper in them. Churlish af. 5/10 neat couch tho https://t.co/u2c9c7qSg8 
 multiple_stages
1063 	 This is just downright precious af. 12/10 for both pupper and doggo https://t.co/o5J479bZUC 
 multiple_stages
1113 	 Like father (doggo), like son (pupper). Both 12/10 https://t.co/pG2inLaOda 
 multiple_stages
In [40]:
# Define: We need to fix the stages one by one

# Code
# Make dictionary with key=index, value=fixed stage
dict_stage = {191: 'puppo', 
              200: 'floofer',
              460: 'pupper',
              575: 'pupper',
              705: np.NaN, #not even a dog
              965: 'doggo'}

for key, value in dict_stage.items():
    archive_df_clean.loc[key, 'dog_stage'] = value
    
# Test
for s in dict_stage.keys():
    print(s, "\t", archive_df_clean['text'][s],
          "\n", archive_df_clean['dog_stage'][s])
191 	 Here's a puppo participating in the #ScienceMarch. Cleverly disguising her own doggo agenda. 13/10 would keep the planet habitable for https://t.co/cMhq16isel 
 puppo
200 	 At first I thought this was a shy doggo, but it's actually a Rare Canadian Floofer Owl. Amateurs would confuse the two. 11/10 only send dogs https://t.co/TXdT3tmuYk 
 floofer
460 	 This is Dido. She's playing the lead role in "Pupper Stops to Catch Snow Before Resuming Shadow Box with Dried Apple." 13/10 (IG: didodoggo) https://t.co/m7isZrOBX7 
 pupper
575 	 This is Bones. He's being haunted by another doggo of roughly the same size. 12/10 deep breaths pupper everything's fine https://t.co/55Dqe0SJNj 
 pupper
705 	 This is Pinot. He's a sophisticated doggo. You can tell by the hat. Also pointier than your average pupper. Still 10/10 would pet cautiously https://t.co/f2wmLZTPHd 
 nan
965 	 This is Arnie. He's a Nova Scotian Fridge Floof. Rare af. 12/10 https://t.co/lprdOylVpS 
 doggo
In [41]:
# Define: Change dos_stage dtype to category

# Code
archive_df_clean.dog_stage = archive_df_clean.dog_stage.astype('category')

# Test
archive_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2175 entries, 0 to 2355
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   tweet_id            2175 non-null   object        
 1   timestamp           2175 non-null   datetime64[ns]
 2   text                2175 non-null   object        
 3   rating_numerator    2175 non-null   float64       
 4   rating_denominator  2175 non-null   float64       
 5   name                1495 non-null   object        
 6   doggo               2175 non-null   object        
 7   floofer             2175 non-null   object        
 8   pupper              2175 non-null   object        
 9   puppo               2175 non-null   object        
 10  dog_stage           344 non-null    category      
dtypes: category(1), datetime64[ns](1), float64(2), object(7)
memory usage: 269.2+ KB
In [42]:
# Define: Drop doggo, floofer, pupper, puppo column

# Code
stages = ['doggo', 'floofer', 'pupper', 'puppo']
archive_df_clean.drop(stages, axis=1, inplace=True)

# Test
archive_df_clean.sample(10)
Out[42]:
tweet_id timestamp text rating_numerator rating_denominator name dog_stage
2028 671866342182637568 2015-12-02 01:39:53 Meet Dylan. He can use a fork but clearly can'... 10.0 10.0 Dylan NaN
1871 675147105808306176 2015-12-11 02:56:28 When you're presenting a group project and the... 10.0 10.0 NaN NaN
1903 674638615994089473 2015-12-09 17:15:54 This pupper is fed up with being tickled. 12/1... 12.0 10.0 NaN pupper
1743 679405845277462528 2015-12-22 20:59:10 Crazy unseen footage from Jurassic Park. 10/10... 10.0 10.0 NaN NaN
468 817056546584727552 2017-01-05 17:13:55 This is Chloe. She fell asleep at the wheel. A... 11.0 10.0 Chloe NaN
1062 741099773336379392 2016-06-10 02:48:49 This is Ted. He's given up. 11/10 relatable af... 11.0 10.0 Ted NaN
2293 667152164079423490 2015-11-19 01:27:25 This is Pipsy. He is a fluffball. Enjoys trave... 12.0 10.0 Pipsy NaN
372 828381636999917570 2017-02-05 23:15:47 Meet Doobert. He's a deaf doggo. Didn't stop h... 14.0 10.0 Doobert doggo
576 800859414831898624 2016-11-22 00:32:18 @SkyWilliams doggo simply protecting you from ... 11.0 10.0 NaN doggo
503 813066809284972545 2016-12-25 17:00:08 This is Tyr. He is disgusted by holiday traffi... 12.0 10.0 Tyr NaN
In [43]:
# Define: Final check and reset index

# Code
archive_df_clean.reset_index(drop=True, inplace=True)

# Test
archive_df_clean
Out[43]:
tweet_id timestamp text rating_numerator rating_denominator name dog_stage
0 892420643555336193 2017-08-01 16:23:56 This is Phineas. He's a mystical boy. Only eve... 13.0 10.0 Phineas NaN
1 892177421306343426 2017-08-01 00:17:27 This is Tilly. She's just checking pup on you.... 13.0 10.0 Tilly NaN
2 891815181378084864 2017-07-31 00:18:03 This is Archie. He is a rare Norwegian Pouncin... 12.0 10.0 Archie NaN
3 891689557279858688 2017-07-30 15:58:51 This is Darla. She commenced a snooze mid meal... 13.0 10.0 Darla NaN
4 891327558926688256 2017-07-29 16:00:24 This is Franklin. He would like you to stop ca... 12.0 10.0 Franklin NaN
... ... ... ... ... ... ... ...
2170 666049248165822465 2015-11-16 00:24:50 Here we have a 1949 1st generation vulpix. Enj... 5.0 10.0 NaN NaN
2171 666044226329800704 2015-11-16 00:04:52 This is a purebred Piers Morgan. Loves to Netf... 6.0 10.0 a NaN
2172 666033412701032449 2015-11-15 23:21:54 Here is a very happy pup. Big fan of well-main... 9.0 10.0 a NaN
2173 666029285002620928 2015-11-15 23:05:30 This is a western brown Mitsubishi terrier. Up... 7.0 10.0 a NaN
2174 666020888022790149 2015-11-15 22:32:08 Here we have a Japanese Irish Setter. Lost eye... 8.0 10.0 NaN NaN

2175 rows × 7 columns

Image Dataframe

What we will do for this dataframe are:

  • remove duplicated image row
  • change tweet_id in into object datatype
  • remove all not useful columns for analysis for analysis
  • select one of p1, p1_conf, p1_dog, p2, p2_conf, p2_dog, p3, p3_conf, p3_dog
In [44]:
image_df_clean = image_df.copy()
image_df_clean
Out[44]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg 1 Welsh_springer_spaniel 0.465074 True collie 0.156665 True Shetland_sheepdog 0.061428 True
1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg 1 redbone 0.506826 True miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True
2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg 1 German_shepherd 0.596461 True malinois 0.138584 True bloodhound 0.116197 True
3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg 1 Rhodesian_ridgeback 0.408143 True redbone 0.360687 True miniature_pinscher 0.222752 True
4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg 1 miniature_pinscher 0.560311 True Rottweiler 0.243682 True Doberman 0.154629 True
... ... ... ... ... ... ... ... ... ... ... ... ...
2070 891327558926688256 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg 2 basset 0.555712 True English_springer 0.225770 True German_short-haired_pointer 0.175219 True
2071 891689557279858688 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg 1 paper_towel 0.170278 False Labrador_retriever 0.168086 True spatula 0.040836 False
2072 891815181378084864 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg 1 Chihuahua 0.716012 True malamute 0.078253 True kelpie 0.031379 True
2073 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg 1 Chihuahua 0.323581 True Pekinese 0.090647 True papillon 0.068957 True
2074 892420643555336193 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg 1 orange 0.097049 False bagel 0.085851 False banana 0.076110 False

2075 rows × 12 columns

Fix tweet_id column dtype

In [45]:
# Define: Change tweet_id dtype to object

# Code
image_df_clean.tweet_id = image_df_clean.tweet_id.astype('object')

# Test
image_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   tweet_id  2075 non-null   object 
 1   jpg_url   2075 non-null   object 
 2   img_num   2075 non-null   int64  
 3   p1        2075 non-null   object 
 4   p1_conf   2075 non-null   float64
 5   p1_dog    2075 non-null   bool   
 6   p2        2075 non-null   object 
 7   p2_conf   2075 non-null   float64
 8   p2_dog    2075 non-null   bool   
 9   p3        2075 non-null   object 
 10  p3_conf   2075 non-null   float64
 11  p3_dog    2075 non-null   bool   
dtypes: bool(3), float64(3), int64(1), object(5)
memory usage: 152.1+ KB

Remove duplicated jpg_url

From the assessment, we found that there is 66 row with duplicated jpg_url.

In [46]:
image_df_clean[image_df_clean.jpg_url.duplicated()]
Out[46]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
1297 752309394570878976 https://pbs.twimg.com/ext_tw_video_thumb/67535... 1 upright 0.303415 False golden_retriever 0.181351 True Brittany_spaniel 0.162084 True
1315 754874841593970688 https://pbs.twimg.com/media/CWza7kpWcAAdYLc.jpg 1 pug 0.272205 True bull_mastiff 0.251530 True bath_towel 0.116806 False
1333 757729163776290825 https://pbs.twimg.com/media/CWyD2HGUYAQ1Xa7.jpg 2 cash_machine 0.802333 False schipperke 0.045519 True German_shepherd 0.023353 True
1345 759159934323924993 https://pbs.twimg.com/media/CU1zsMSUAAAS0qW.jpg 1 Irish_terrier 0.254856 True briard 0.227716 True soft-coated_wheaten_terrier 0.223263 True
1349 759566828574212096 https://pbs.twimg.com/media/CkNjahBXAAQ2kWo.jpg 1 Labrador_retriever 0.967397 True golden_retriever 0.016641 True ice_bear 0.014858 False
... ... ... ... ... ... ... ... ... ... ... ... ...
1903 851953902622658560 https://pbs.twimg.com/media/C4KHj-nWQAA3poV.jpg 1 Staffordshire_bullterrier 0.757547 True American_Staffordshire_terrier 0.149950 True Chesapeake_Bay_retriever 0.047523 True
1944 861769973181624320 https://pbs.twimg.com/media/CzG425nWgAAnP7P.jpg 2 Arabian_camel 0.366248 False house_finch 0.209852 False cocker_spaniel 0.046403 True
1992 873697596434513921 https://pbs.twimg.com/media/DA7iHL5U0AA1OQo.jpg 1 laptop 0.153718 False French_bulldog 0.099984 True printer 0.077130 False
2041 885311592912609280 https://pbs.twimg.com/media/C4bTH6nWMAAX_bJ.jpg 1 Labrador_retriever 0.908703 True seat_belt 0.057091 False pug 0.011933 True
2055 888202515573088257 https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg 2 Pembroke 0.809197 True Rhodesian_ridgeback 0.054950 True beagle 0.038915 True

66 rows × 12 columns

In [47]:
# Define: Drop the duplicated

# Code
image_df_clean.drop_duplicates(subset='jpg_url', keep='first', inplace=True)

# Test
image_df_clean[image_df_clean.jpg_url.duplicated()]
Out[47]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog

Prediction columns

Make new columns for p, p_conf, and p_dog only, based on prediction.

In [48]:
# Define
# Make iteration with if function to determine dog breed/type and p_conf score,
# based on boolean value in p1, p2, or p3

# Code
dog_type = []
p_conf = []

for idx, col in image_df_clean.iterrows():
    p1_dog = col[5]
    p2_dog = col[8]
    p3_dog = col[11]
    
    if p1_dog:
        dog_type.append(col[3])
        p_conf.append(col[4])
    elif p3_dog:
        dog_type.append(col[6])
        p_conf.append(col[7])
    elif p3_dog:
        dog_type.append(col[9])
        p_conf.append(col[10])
    else:
        dog_type.append(np.NaN)
        p_conf.append(np.NaN)

# Make new column for image dataframe
image_df_clean['dog_type'] = dog_type
image_df_clean['p_conf'] = p_conf

# Test
image_df_clean.sample(10)
Out[48]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog dog_type p_conf
1112 724049859469295616 https://pbs.twimg.com/media/CgxXf1TWYAEjY61.jpg 1 Border_collie 0.581835 True collie 0.344588 True Shetland_sheepdog 0.043584 True Border_collie 0.581835
150 668641109086707712 https://pbs.twimg.com/media/CUd9ivxWUAAuXSQ.jpg 1 vacuum 0.432594 False pug 0.146311 True toilet_tissue 0.024500 False NaN NaN
1608 800751577355128832 https://pbs.twimg.com/media/CxzXOyBW8AEu_Oi.jpg 2 cocker_spaniel 0.771984 True miniature_poodle 0.076653 True toy_poodle 0.039618 True cocker_spaniel 0.771984
749 687818504314159109 https://pbs.twimg.com/media/CYufR8_WQAAWCqo.jpg 1 Lakeland_terrier 0.873029 True soft-coated_wheaten_terrier 0.060924 True toy_poodle 0.017031 True Lakeland_terrier 0.873029
1749 823699002998870016 https://pbs.twimg.com/media/C25d3nkXEAAFBUN.jpg 1 cairn 0.203999 True snorkel 0.171893 False Norfolk_terrier 0.107543 True cairn 0.203999
1898 850753642995093505 https://pbs.twimg.com/media/C8576jrW0AEYWFy.jpg 1 pug 0.996952 True bull_mastiff 0.000996 True French_bulldog 0.000883 True pug 0.996952
486 675497103322386432 https://pbs.twimg.com/media/CV_ZAhcUkAUeKtZ.jpg 1 vizsla 0.519589 True miniature_pinscher 0.064771 True Rhodesian_ridgeback 0.061491 True vizsla 0.519589
739 687127927494963200 https://pbs.twimg.com/media/CYkrNIVWcAMswmP.jpg 1 pug 0.178205 True Chihuahua 0.149164 True Shih-Tzu 0.120505 True pug 0.178205
1790 830097400375152640 https://pbs.twimg.com/media/C4UZLZLWYAA0dcs.jpg 4 toy_poodle 0.442713 True Pomeranian 0.142073 True Pekinese 0.125745 True toy_poodle 0.442713
45 666786068205871104 https://pbs.twimg.com/media/CUDmZIkWcAAIPPe.jpg 1 snail 0.999888 False slug 0.000055 False acorn 0.000026 False NaN NaN
In [49]:
# Define: Remove not useful for analysis columns

# Code
columns = image_df_clean.columns[2:-2].to_list()
image_df_clean.drop(columns, axis=1, inplace=True)

# Test
image_df_clean
Out[49]:
tweet_id jpg_url dog_type p_conf
0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg Welsh_springer_spaniel 0.465074
1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg redbone 0.506826
2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg German_shepherd 0.596461
3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg Rhodesian_ridgeback 0.408143
4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg miniature_pinscher 0.560311
... ... ... ... ...
2070 891327558926688256 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg basset 0.555712
2071 891689557279858688 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg NaN NaN
2072 891815181378084864 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg Chihuahua 0.716012
2073 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg Chihuahua 0.323581
2074 892420643555336193 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg NaN NaN

2009 rows × 4 columns

In [50]:
# Define: Change dog_type dtype column to category

# Code
image_df_clean.dog_type = image_df_clean.dog_type.astype('category')

# Test
image_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2009 entries, 0 to 2074
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   tweet_id  2009 non-null   object  
 1   jpg_url   2009 non-null   object  
 2   dog_type  1638 non-null   category
 3   p_conf    1638 non-null   float64 
dtypes: category(1), float64(1), object(2)
memory usage: 73.0+ KB
In [51]:
# Define: Final check and reset index

# Code
image_df_clean.reset_index(drop=True, inplace=True)

# Test
image_df_clean
Out[51]:
tweet_id jpg_url dog_type p_conf
0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg Welsh_springer_spaniel 0.465074
1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg redbone 0.506826
2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg German_shepherd 0.596461
3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg Rhodesian_ridgeback 0.408143
4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg miniature_pinscher 0.560311
... ... ... ... ...
2004 891327558926688256 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg basset 0.555712
2005 891689557279858688 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg NaN NaN
2006 891815181378084864 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg Chihuahua 0.716012
2007 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg Chihuahua 0.323581
2008 892420643555336193 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg NaN NaN

2009 rows × 4 columns

Tweepy Dataframe

What we will do for this dataframe are:

  • remove non original tweet
  • change id column name to tweet_id then change the datatype to 'object'
  • remove not useful columns for analysis i.e (id_str, in_reply_to_status_id, in_reply_to_status_id_str, in_reply_to_user_id, in_reply_to_user_id_str, lang, quoted_status_id, and quoted_status_id_str
In [52]:
# Copy the original dataframe first
tweepy_df_clean = tweepy_df.copy()
tweepy_df_clean.sample(10)
Out[52]:
created_at id id_str full_text truncated display_text_range entities extended_entities source in_reply_to_status_id ... favorited retweeted possibly_sensitive possibly_sensitive_appealable lang retweeted_status quoted_status_id quoted_status_id_str quoted_status_permalink quoted_status
821 2016-08-17 01:20:27+00:00 765719909049503744 765719909049503744 This is Brat. He has a hard time being ferocio... False [0, 115] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 765719895086596097, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
643 2016-10-23 19:42:02+00:00 790277117346975746 790277117346975744 This is Bruce. He never backs down from a chal... False [0, 77] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 790277108719386624, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
62 2017-06-28 00:42:13+00:00 879862464715927552 879862464715927552 This is Romeo. He would like to do an entrance... False [0, 91] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 879862459263307776, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2177 2015-11-23 02:19:29+00:00 668614819948453888 668614819948453888 Here is a horned dog. Much grace. Can jump ove... False [0, 139] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 668614813715664896, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
2003 2015-12-01 05:26:34+00:00 671561002136281088 671561002136281088 This is the best thing I've ever seen so sprea... False [0, 144] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 671561000215298048, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
1837 2015-12-11 03:05:37+00:00 675149409102012420 675149409102012416 holy shit 12/10 https://t.co/p6O8X93bTQ False [0, 39] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 675149402210701313, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
657 2016-10-19 01:29:35+00:00 788552643979468800 788552643979468800 RT @dog_rates: Say hello to mad pupper. You kn... False [0, 130] {'hashtags': [], 'symbols': [], 'user_mentions... NaN <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en {'created_at': 'Sat May 28 03:04:00 +0000 2016... NaN NaN NaN NaN
766 2016-09-07 15:44:53+00:00 773547596996571136 773547596996571136 This is Chelsea. She forgot how to dog. 11/10 ... False [0, 68] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 773547591439122432, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
1331 2016-02-25 19:04:13+00:00 702932127499816960 702932127499816960 This is Chip. He's an Upper West Nile Pantaloo... False [0, 137] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 702932120042397696, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN
1053 2016-06-02 16:10:29+00:00 738402415918125056 738402415918125056 "Don't talk to me or my son ever again" ...10/... False [0, 57] {'hashtags': [], 'symbols': [], 'user_mentions... {'media': [{'id': 738402403196796928, 'id_str'... <a href="http://twitter.com/download/iphone" r... NaN ... False False 0.0 0.0 en NaN NaN NaN NaN NaN

10 rows × 32 columns

Select only useful columns

In [53]:
# Define: Remove id, retweet_count, and favorite_count column

# Code
tweepy_df_clean = tweepy_df_clean[['id', 'retweet_count', 'favorite_count']]

# Test
tweepy_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2322 entries, 0 to 2321
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype
---  ------          --------------  -----
 0   id              2322 non-null   int64
 1   retweet_count   2322 non-null   int64
 2   favorite_count  2322 non-null   int64
dtypes: int64(3)
memory usage: 54.5 KB

Fix id column, rename and change dtype to object

In [54]:
# Define: Rename id column to tweet_id, then change dtype to object

# Code
tweepy_df_clean = tweepy_df_clean.rename({'id': 'tweet_id'}, axis=1)
tweepy_df_clean.tweet_id = tweepy_df_clean.tweet_id.astype('object')

# Test
tweepy_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2322 entries, 0 to 2321
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   tweet_id        2322 non-null   object
 1   retweet_count   2322 non-null   int64 
 2   favorite_count  2322 non-null   int64 
dtypes: int64(2), object(1)
memory usage: 54.5+ KB

Join and Store All Three Dataframes

All dataframe will be merged based on tweet_id as the primary key. The final dataframe will be inner-joined. Then, after final checking, we will save the dataframe to CSV file, named 'twitter_archive_master.csv'.

In [55]:
# Define: Join all three dataframe using .merge() method

# Code
twitter_archive_master = archive_df_clean.merge(image_df_clean,on='tweet_id').merge(tweepy_df_clean,on='tweet_id')
twitter_archive_master.reset_index(drop=True, inplace=True)

# Test
twitter_archive_master
Out[55]:
tweet_id timestamp text rating_numerator rating_denominator name dog_stage jpg_url dog_type p_conf retweet_count favorite_count
0 892420643555336193 2017-08-01 16:23:56 This is Phineas. He's a mystical boy. Only eve... 13.0 10.0 Phineas NaN https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg NaN NaN 7604 35884
1 892177421306343426 2017-08-01 00:17:27 This is Tilly. She's just checking pup on you.... 13.0 10.0 Tilly NaN https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg Chihuahua 0.323581 5631 30943
2 891815181378084864 2017-07-31 00:18:03 This is Archie. He is a rare Norwegian Pouncin... 12.0 10.0 Archie NaN https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg Chihuahua 0.716012 3726 23295
3 891689557279858688 2017-07-30 15:58:51 This is Darla. She commenced a snooze mid meal... 13.0 10.0 Darla NaN https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg NaN NaN 7773 39140
4 891327558926688256 2017-07-29 16:00:24 This is Franklin. He would like you to stop ca... 12.0 10.0 Franklin NaN https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg basset 0.555712 8378 37390
... ... ... ... ... ... ... ... ... ... ... ... ...
1974 666049248165822465 2015-11-16 00:24:50 Here we have a 1949 1st generation vulpix. Enj... 5.0 10.0 NaN NaN https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg miniature_pinscher 0.560311 40 96
1975 666044226329800704 2015-11-16 00:04:52 This is a purebred Piers Morgan. Loves to Netf... 6.0 10.0 a NaN https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg Rhodesian_ridgeback 0.408143 130 269
1976 666033412701032449 2015-11-15 23:21:54 Here is a very happy pup. Big fan of well-main... 9.0 10.0 a NaN https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg German_shepherd 0.596461 41 111
1977 666029285002620928 2015-11-15 23:05:30 This is a western brown Mitsubishi terrier. Up... 7.0 10.0 a NaN https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg redbone 0.506826 42 120
1978 666020888022790149 2015-11-15 22:32:08 Here we have a Japanese Irish Setter. Lost eye... 8.0 10.0 NaN NaN https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg Welsh_springer_spaniel 0.465074 459 2388

1979 rows × 12 columns

In [56]:
# Test, check dtypes
twitter_archive_master.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1979 entries, 0 to 1978
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   tweet_id            1979 non-null   object        
 1   timestamp           1979 non-null   datetime64[ns]
 2   text                1979 non-null   object        
 3   rating_numerator    1979 non-null   float64       
 4   rating_denominator  1979 non-null   float64       
 5   name                1436 non-null   object        
 6   dog_stage           301 non-null    category      
 7   jpg_url             1979 non-null   object        
 8   dog_type            1619 non-null   category      
 9   p_conf              1619 non-null   float64       
 10  retweet_count       1979 non-null   int64         
 11  favorite_count      1979 non-null   int64         
dtypes: category(2), datetime64[ns](1), float64(3), int64(2), object(4)
memory usage: 167.0+ KB
In [57]:
# Define: Save complete dataframe into CSV file

# Code
twitter_archive_master.to_csv('twitter_archive_master.csv', index=False)

# Test
os.path.isfile('./twitter_archive_master.csv')
Out[57]:
True

Analysis and Visualization

In [58]:
twitter_archive_master
Out[58]:
tweet_id timestamp text rating_numerator rating_denominator name dog_stage jpg_url dog_type p_conf retweet_count favorite_count
0 892420643555336193 2017-08-01 16:23:56 This is Phineas. He's a mystical boy. Only eve... 13.0 10.0 Phineas NaN https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg NaN NaN 7604 35884
1 892177421306343426 2017-08-01 00:17:27 This is Tilly. She's just checking pup on you.... 13.0 10.0 Tilly NaN https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg Chihuahua 0.323581 5631 30943
2 891815181378084864 2017-07-31 00:18:03 This is Archie. He is a rare Norwegian Pouncin... 12.0 10.0 Archie NaN https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg Chihuahua 0.716012 3726 23295
3 891689557279858688 2017-07-30 15:58:51 This is Darla. She commenced a snooze mid meal... 13.0 10.0 Darla NaN https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg NaN NaN 7773 39140
4 891327558926688256 2017-07-29 16:00:24 This is Franklin. He would like you to stop ca... 12.0 10.0 Franklin NaN https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg basset 0.555712 8378 37390
... ... ... ... ... ... ... ... ... ... ... ... ...
1974 666049248165822465 2015-11-16 00:24:50 Here we have a 1949 1st generation vulpix. Enj... 5.0 10.0 NaN NaN https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg miniature_pinscher 0.560311 40 96
1975 666044226329800704 2015-11-16 00:04:52 This is a purebred Piers Morgan. Loves to Netf... 6.0 10.0 a NaN https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg Rhodesian_ridgeback 0.408143 130 269
1976 666033412701032449 2015-11-15 23:21:54 Here is a very happy pup. Big fan of well-main... 9.0 10.0 a NaN https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg German_shepherd 0.596461 41 111
1977 666029285002620928 2015-11-15 23:05:30 This is a western brown Mitsubishi terrier. Up... 7.0 10.0 a NaN https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg redbone 0.506826 42 120
1978 666020888022790149 2015-11-15 22:32:08 Here we have a Japanese Irish Setter. Lost eye... 8.0 10.0 NaN NaN https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg Welsh_springer_spaniel 0.465074 459 2388

1979 rows × 12 columns

Most common name for dog

In [59]:
twitter_archive_master.name.value_counts()
Out[59]:
a          55
Charlie    10
Oliver     10
Cooper     10
Lucy        9
           ..
Joey        1
Evy         1
Bloop       1
Shadoe      1
Sailer      1
Name: name, Length: 930, dtype: int64
In [60]:
twitter_archive_master.name.value_counts().head(10).plot(kind='barh')
plt.title('Dog Name Count')
plt.xlabel('Name Count')
plt.ylabel('Dog Name');
In [61]:
named_a = twitter_archive_master.index[twitter_archive_master.name == 'a']

for s in named_a:
    print(s, "\t", twitter_archive_master['text'][s])
49 	 Here is a pupper approaching maximum borkdrive. Zooming at never before seen speeds. 14/10 paw-inspiring af 
(IG: puffie_the_chow) https://t.co/ghXBIIeQZF
462 	 Here is a perfect example of someone who has their priorities in order. 13/10 for both owner and Forrest https://t.co/LRyMrU7Wfq
571 	 Guys this is getting so out of hand. We only rate dogs. This is a Galapagos Speed Panda. Pls only send dogs... 10/10 https://t.co/8lpAGaZRFn
734 	 This is a mighty rare blue-tailed hammer sherk. Human almost lost a limb trying to take these. Be careful guys. 8/10 https://t.co/TGenMeXreW
736 	 Viewer discretion is advised. This is a terrible attack in progress. Not even in water (tragic af). 4/10 bad sherk https://t.co/L3U0j14N5R
745 	 This is a carrot. We only rate dogs. Please only send in dogs. You all really should know this by now ...11/10 https://t.co/9e48aPrBm2
771 	 This is a very rare Great Alaskan Bush Pupper. Hard to stumble upon without spooking. 12/10 would pet passionately https://t.co/xOBKCdpzaa
907 	 People please. This is a Deadly Mediterranean Plop T-Rex. We only rate dogs. Only send in dogs. Thanks you... 11/10 https://t.co/2ATDsgHD4n
917 	 This is a taco. We only rate dogs. Please only send in dogs. Dogs are what we rate. Not tacos. Thank you... 10/10 https://t.co/cxl6xGY8B9
1032 	 Here is a heartbreaking scene of an incredible pupper being laid to rest. 10/10 RIP pupper https://t.co/81mvJ0rGRu
1041 	 Here is a whole flock of puppers.  60/50 I'll take the lot https://t.co/9dpcw6MdWa
1051 	 This is a Butternut Cumberfloof. It's not windy they just look like that. 11/10 back at it again with the red socks https://t.co/hMjzhdUHaW
1057 	 This is a Wild Tuscan Poofwiggle. Careful not to startle. Rare tongue slip. One eye magical. 12/10 would def pet https://t.co/4EnShAQjv6
1069 	 "Pupper is a present to world. Here is a bow for pupper." 12/10 precious as hell https://t.co/ItSsE92gCW
1172 	 This is a rare Arctic Wubberfloof. Unamused by the happenings. No longer has the appetites. 12/10 would totally hug https://t.co/krvbacIX0N
1384 	 Guys this really needs to stop. We've been over this way too many times. This is a giraffe. We only rate dogs.. 7/10 https://t.co/yavgkHYPOC
1427 	 This is a dog swinging. I really enjoyed it so I hope you all do as well. 11/10 https://t.co/Ozo9KHTRND
1489 	 This is a Sizzlin Menorah spaniel from Brooklyn named Wylie. Lovable eyes. Chiller as hell. 10/10 and I'm out.. poof https://t.co/7E0AiJXPmI
1490 	 Seriously guys?! Only send in dogs. I only rate dogs. This is a baby black bear... 11/10 https://t.co/H7kpabTfLj
1513 	 C'mon guys. We've been over this. We only rate dogs. This is a cow. Please only submit dogs. Thank you...... 9/10 https://t.co/WjcELNEqN2
1514 	 This is a fluffy albino Bacardi Columbia mix. Excellent at the tweets. 11/10 would hug gently https://t.co/diboDRUuEI
1555 	 This is a Sagitariot Baklava mix. Loves her new hat. 11/10 radiant pup https://t.co/Bko5kFJYUU
1572 	 This is a heavily opinionated dog. Loves walls. Nobody knows how the hair works. Always ready for a kiss. 4/10 https://t.co/dFiaKZ9cDl
1586 	 This is a Lofted Aphrodisiac Terrier named Kip. Big fan of bed n breakfasts. Fits perfectly. 10/10 would pet firmly https://t.co/gKlLpNzIl3
1624 	 This is a baby Rand Paul. Curls for days. 11/10 would cuddle the hell out of https://t.co/xHXNaPAYRe
1664 	 This is a Tuscaloosa Alcatraz named Jacob (Yacōb). Loves to sit in swing. Stellar tongue. 11/10 look at his feet https://t.co/2IslQ8ZSc7
1695 	 This is a Helvetica Listerine named Rufus. This time Rufus will be ready for the UPS guy. He'll never expect it 9/10 https://t.co/34OhVhMkVr
1745 	 This is a Deciduous Trimester mix named Spork. Only 1 ear works. No seat belt. Incredibly reckless. 9/10 still cute https://t.co/CtuJoLHiDo
1754 	 This is a Rich Mahogany Seltzer named Cherokee. Just got destroyed by a snowball. Isn't very happy about it. 9/10 https://t.co/98ZBi6o4dj
1757 	 This is a Speckled Cauliflower Yosemite named Hemry. He's terrified of intruder dog. Not one bit comfortable. 9/10 https://t.co/yV3Qgjh8iN
1775 	 This is a spotted Lipitor Rumpelstiltskin named Alphred. He can't wait for the Turkey. 10/10 would pet really well https://t.co/6GUGO7azNX
1781 	 This is a brave dog. Excellent free climber. Trying to get closer to God. Not very loyal though. Doesn't bark. 5/10 https://t.co/ODnILTr4QM
1789 	 This is a Coriander Baton Rouge named Alfredo. Loves to cuddle with smaller well-dressed dog. 10/10 would hug lots https://t.co/eCRdwouKCl
1818 	 This is a Slovakian Helter Skelter Feta named Leroi. Likes to skip on roofs. Good traction. Much balance. 10/10 wow! https://t.co/Dmy2mY2Qj5
1825 	 This is a wild Toblerone from Papua New Guinea. Mouth always open. Addicted to hay. Acts blind. 7/10 handsome dog https://t.co/IGmVbz07tZ
1838 	 Here is a horned dog. Much grace. Can jump over moons (dam!). Paws not soft. Bad at barking. 7/10 can still pet tho https://t.co/2Su7gmsnZm
1844 	 This is a Birmingham Quagmire named Chuk. Loves to relax and watch the game while sippin on that iced mocha. 10/10 https://t.co/HvNg9JWxFt
1848 	 Here is a mother dog caring for her pups. Snazzy red mohawk. Doesn't wag tail. Pups look confused. Overall 4/10 https://t.co/YOHe6lf09m
1861 	 This is a Trans Siberian Kellogg named Alfonso. Huge ass eyeballs. Actually Dobby from Harry Potter. 7/10 https://t.co/XpseHBlAAb
1875 	 This is a Shotokon Macadamia mix named Cheryl. Sophisticated af. Looks like a disappointed librarian. Shh (lol) 9/10 https://t.co/J4GnJ5Swba
1881 	 This is a rare Hungarian Pinot named Jessiga. She is either mid-stroke or got stuck in the washing machine. 8/10 https://t.co/ZU0i0KJyqD
1888 	 This is a southwest Coriander named Klint. Hat looks expensive. Still on house arrest :(
9/10 https://t.co/IQTOMqDUIe
1897 	 This is a northern Wahoo named Kohl. He runs this town. Chases tumbleweeds. Draws gun wicked fast. 11/10 legendary https://t.co/J4vn2rOYFk
1911 	 This is a Dasani Kingfisher from Maine. His name is Daryl. Daryl doesn't like being swallowed by a panda. 8/10 https://t.co/jpaeu6LNmW
1927 	 This is a curly Ticonderoga named Pepe. No feet. Loves to jet ski. 11/10 would hug until forever https://t.co/cyDfaK8NBc
1934 	 This is a purebred Bacardi named Octaviath. Can shoot spaghetti out of mouth. 10/10 https://t.co/uEvsGLOFHa
1937 	 This is a golden Buckminsterfullerene named Johm. Drives trucks. Lumberjack (?). Enjoys wall. 8/10 would hug softly https://t.co/uQbZJM2DQB
1950 	 This is a southern Vesuvius bumblegruff. Can drive a truck (wow). Made friends with 5 other nifty dogs (neat). 7/10 https://t.co/LopTBkKa8h
1957 	 This is a funny dog. Weird toes. Won't come down. Loves branch. Refuses to eat his food. Hard to cuddle with. 3/10 https://t.co/IIXis0zta0
1970 	 My oh my. This is a rare blond Canadian terrier on wheels. Only $8.98. Rather docile. 9/10 very rare https://t.co/yWBqbrzy8O
1971 	 Here is a Siberian heavily armored polar bear mix. Strong owner. 10/10 I would do unspeakable things to pet this dog https://t.co/rdivxLiqEt
1973 	 This is a truly beautiful English Wilson Staff retriever. Has a nice phone. Privileged. 10/10 would trade lives with https://t.co/fvIbQfHjIe
1975 	 This is a purebred Piers Morgan. Loves to Netflix and chill. Always looks like he forgot to unplug the iron. 6/10 https://t.co/DWnyCjf2mx
1976 	 Here is a very happy pup. Big fan of well-maintained decks. Just look at that tongue. 9/10 would cuddle af https://t.co/y671yMhoiR
1977 	 This is a western brown Mitsubishi terrier. Upset about leaf. Actually 2 dogs here. 7/10 would walk the shit out of https://t.co/r7mOb2m0UI

Dogs has varies name given by it's owner. This is kind of interesting, from the detection, people tends not to share their dog name to the WeRateDogs users. Usually people only share only it's stage or type in the Twitter.

For the most common name for dog posted is Oliver, Cooper, and Charlie, each with count 10.

Most common dog_type

In [62]:
twitter_archive_master.dog_type.value_counts()
Out[62]:
golden_retriever      147
Labrador_retriever     98
Pembroke               93
Chihuahua              84
pug                    55
                     ... 
Japanese_spaniel        1
loggerhead              1
maillot                 1
mink                    1
wood_rabbit             1
Name: dog_type, Length: 164, dtype: int64
In [63]:
twitter_archive_master.dog_type.value_counts().head(10).plot(kind='barh')
plt.title('Dog Type Post Count')
plt.xlabel('Post Count')
plt.ylabel('Dog Type');
In [64]:
golden_retriever = twitter_archive_master[twitter_archive_master['dog_type'] == 'golden_retriever']['jpg_url'].values[0]
response = requests.get(golden_retriever)
print('One of the most popular dog')
Image.open(BytesIO(response.content))
One of the most popular dog
Out[64]:
In [65]:
counts = ['retweet_count', 'favorite_count']
sum_count = twitter_archive_master.groupby(['dog_type'])['retweet_count', 'favorite_count'].sum().sort_values(by=counts, ascending=False)
sum_count
<ipython-input-65-2aafc70c5a2a>:2: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
  sum_count = twitter_archive_master.groupby(['dog_type'])['retweet_count', 'favorite_count'].sum().sort_values(by=counts, ascending=False)
Out[65]:
retweet_count favorite_count
dog_type
golden_retriever 483833 1681869
Labrador_retriever 322538 1023817
Pembroke 251702 942916
Chihuahua 199845 641301
Samoyed 158874 480634
... ... ...
groenendael 363 1727
corn 342 1052
hyena 273 1285
indri 192 523
hair_spray 79 310

164 rows × 2 columns

The most common type in WeRateDogs is Golden Retriever and it has the most retweet count and favorite count among the all.

But for the average of retweet and favourite count, the most count is House Finch. The Golden Retriever event not in top 10 of the list.

In [66]:
mean_count = twitter_archive_master.groupby(['dog_type'])['retweet_count', 'favorite_count'].mean().sort_values(by=counts, ascending=False)
mean_count.head(10)
<ipython-input-66-20b93dbab5ee>:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
  mean_count = twitter_archive_master.groupby(['dog_type'])['retweet_count', 'favorite_count'].mean().sort_values(by=counts, ascending=False)
Out[66]:
retweet_count favorite_count
dog_type
house_finch 35006.000000 75477.000000
leafhopper 30004.000000 74161.000000
oscilloscope 12614.000000 27701.000000
Bedlington_terrier 7225.500000 22790.833333
standard_poodle 5200.625000 13054.250000
Afghan_hound 5156.666667 15630.000000
Eskimo_dog 4772.578947 13361.894737
English_springer 4725.300000 12878.000000
academic_gown 4593.000000 19207.000000
Saluki 4459.250000 22022.000000

Dog type rating

In [67]:
# First, we need make new column, which is rating for each post
numerator = twitter_archive_master.rating_numerator
denominator = twitter_archive_master.rating_denominator
twitter_archive_master['rating'] = numerator / denominator
twitter_archive_master
Out[67]:
tweet_id timestamp text rating_numerator rating_denominator name dog_stage jpg_url dog_type p_conf retweet_count favorite_count rating
0 892420643555336193 2017-08-01 16:23:56 This is Phineas. He's a mystical boy. Only eve... 13.0 10.0 Phineas NaN https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg NaN NaN 7604 35884 1.3
1 892177421306343426 2017-08-01 00:17:27 This is Tilly. She's just checking pup on you.... 13.0 10.0 Tilly NaN https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg Chihuahua 0.323581 5631 30943 1.3
2 891815181378084864 2017-07-31 00:18:03 This is Archie. He is a rare Norwegian Pouncin... 12.0 10.0 Archie NaN https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg Chihuahua 0.716012 3726 23295 1.2
3 891689557279858688 2017-07-30 15:58:51 This is Darla. She commenced a snooze mid meal... 13.0 10.0 Darla NaN https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg NaN NaN 7773 39140 1.3
4 891327558926688256 2017-07-29 16:00:24 This is Franklin. He would like you to stop ca... 12.0 10.0 Franklin NaN https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg basset 0.555712 8378 37390 1.2
... ... ... ... ... ... ... ... ... ... ... ... ... ...
1974 666049248165822465 2015-11-16 00:24:50 Here we have a 1949 1st generation vulpix. Enj... 5.0 10.0 NaN NaN https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg miniature_pinscher 0.560311 40 96 0.5
1975 666044226329800704 2015-11-16 00:04:52 This is a purebred Piers Morgan. Loves to Netf... 6.0 10.0 a NaN https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg Rhodesian_ridgeback 0.408143 130 269 0.6
1976 666033412701032449 2015-11-15 23:21:54 Here is a very happy pup. Big fan of well-main... 9.0 10.0 a NaN https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg German_shepherd 0.596461 41 111 0.9
1977 666029285002620928 2015-11-15 23:05:30 This is a western brown Mitsubishi terrier. Up... 7.0 10.0 a NaN https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg redbone 0.506826 42 120 0.7
1978 666020888022790149 2015-11-15 22:32:08 Here we have a Japanese Irish Setter. Lost eye... 8.0 10.0 NaN NaN https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg Welsh_springer_spaniel 0.465074 459 2388 0.8

1979 rows × 13 columns

Most and lowest rate

In [68]:
rating = twitter_archive_master.groupby(['dog_type']).sum().sort_values(by=['rating'], ascending=False)
rating[['rating_numerator', 'rating_denominator', 'rating']]
Out[68]:
rating_numerator rating_denominator rating
dog_type
golden_retriever 1942.5 1668.0 173.196753
Labrador_retriever 1352.0 1220.0 108.800000
Pembroke 1059.0 930.0 105.900000
Chihuahua 899.0 840.0 89.900000
pug 565.0 550.0 56.500000
... ... ... ...
mosquito_net 8.0 10.0 0.800000
ram 7.0 10.0 0.700000
sunglasses 6.0 10.0 0.600000
Japanese_spaniel 5.0 10.0 0.500000
loggerhead 3.0 10.0 0.300000

164 rows × 3 columns

In [69]:
loggerhead = twitter_archive_master[twitter_archive_master['dog_type'] == 'loggerhead']['jpg_url'].values[0]
response = requests.get(loggerhead)
print('One of the least rated dog')
Image.open(BytesIO(response.content))
One of the least rated dog
Out[69]:
In [70]:
print(f"The most rated dog is {rating.iloc[0].name} with rate {rating.iloc[0]['rating']}")
print(f"The lowest rated dog is {rating.iloc[-1].name} with rate {rating.iloc[-1]['rating']}")
The most rated dog is golden_retriever with rate 173.19675324675308
The lowest rated dog is loggerhead with rate 0.3
In [71]:
twitter_archive_master.sort_values(by='rating', ascending=False)
Out[71]:
tweet_id timestamp text rating_numerator rating_denominator name dog_stage jpg_url dog_type p_conf retweet_count favorite_count rating
714 749981277374128128 2016-07-04 15:00:45 This is Atticus. He's quite simply America af.... 1776.0 10.0 Atticus NaN https://pbs.twimg.com/media/CmgBZ7kWcAAlzFD.jpg NaN NaN 2444 5090 177.600000
1703 670842764863651840 2015-11-29 05:52:33 After so many requests... here you go.\n\nGood... 420.0 10.0 NaN NaN https://pbs.twimg.com/media/CU9P717W4AAOlKx.jpg NaN NaN 8210 23516 42.000000
376 810984652412424192 2016-12-19 23:06:23 Meet Sam. She smiles 24/7 &amp; secretly aspir... 24.0 7.0 Sam NaN https://pbs.twimg.com/media/C0EyPZbXAAAceSc.jpg golden_retriever 0.871342 1452 5384 3.428571
325 819004803107983360 2017-01-11 02:15:36 This is Bo. He was a very good First Doggo. 14... 14.0 10.0 Bo doggo https://pbs.twimg.com/media/C12whDoVEAALRxa.jpg standard_poodle 0.351308 37054 87338 1.400000
1267 685547936038666240 2016-01-08 19:45:39 Everybody needs to read this. Jack is our firs... 14.0 10.0 NaN pupper https://pbs.twimg.com/media/CYOONfZW8AA7IOA.jpg NaN NaN 15366 32487 1.400000
... ... ... ... ... ... ... ... ... ... ... ... ... ...
1885 667549055577362432 2015-11-20 03:44:31 Never seen dog like this. Breathes heavy. Tilt... 1.0 10.0 NaN NaN https://pbs.twimg.com/media/CUOcVCwWsAERUKY.jpg NaN NaN 2113 5485 0.100000
1505 675153376133427200 2015-12-11 03:21:23 What kind of person sends in a picture without... 1.0 10.0 NaN NaN https://pbs.twimg.com/media/CV6gaUUWEAAnETq.jpg NaN NaN 2471 6006 0.100000
1720 670783437142401025 2015-11-29 01:56:48 Flamboyant pup here. Probably poisonous. Won't... 1.0 10.0 NaN NaN https://pbs.twimg.com/media/CU8Z-OxXAAA-sd2.jpg NaN NaN 362 786 0.100000
744 746906459439529985 2016-06-26 03:22:31 PUPDATE: can't see any. Even if I could, I cou... 0.0 10.0 NaN NaN https://pbs.twimg.com/media/Cl2LdofXEAATl7x.jpg NaN NaN 293 2874 0.000000
230 835152434251116546 2017-02-24 15:40:31 When you're so blinded by your systematic plag... 0.0 10.0 NaN NaN https://pbs.twimg.com/media/C5cOtWVWMAEjO5p.jpg American_Staffordshire_terrier 0.012731 2987 22268 0.000000

1979 rows × 13 columns

Most and lowest average rate

In [72]:
avg_rating = twitter_archive_master.groupby(['dog_type']).mean().sort_values(by=['rating'], ascending=False)
avg_rating['avg_rating'] = avg_rating['rating']
avg_rating[['rating_numerator', 'rating_denominator', 'avg_rating']]
Out[72]:
rating_numerator rating_denominator avg_rating
dog_type
racket 13.0 10.0 1.3
paddle 13.0 10.0 1.3
timber_wolf 13.0 10.0 1.3
house_finch 13.0 10.0 1.3
oxygen_mask 13.0 10.0 1.3
... ... ... ...
plow 8.0 10.0 0.8
ram 7.0 10.0 0.7
sunglasses 6.0 10.0 0.6
Japanese_spaniel 5.0 10.0 0.5
loggerhead 3.0 10.0 0.3

164 rows × 3 columns

In [73]:
print(f"The most average rated dog is {avg_rating.iloc[0].name} with average rate {avg_rating.iloc[0]['rating']}")
print(f"The lowest average rated dog is {avg_rating.iloc[-1].name} with average rate {avg_rating.iloc[-1]['rating']}")
The most average rated dog is racket with average rate 1.3
The lowest average rated dog is loggerhead with average rate 0.3
In [74]:
clumber = twitter_archive_master[twitter_archive_master['dog_type'] == 'clumber']['jpg_url'].values[0]
response = requests.get(clumber)
print('One of the least rated dog')
print(twitter_archive_master[twitter_archive_master['dog_type'] == 'clumber']['name'].values[0])
Image.open(BytesIO(response.content))
One of the least rated dog
Sophie
Out[74]:

Correlation between each columns

In [75]:
twitter_archive_master.corr()
Out[75]:
rating_numerator rating_denominator p_conf retweet_count favorite_count rating
rating_numerator 1.000000 0.198444 0.017734 0.018127 0.015940 0.979811
rating_denominator 0.198444 1.000000 -0.013553 -0.020164 -0.027449 -0.001055
p_conf 0.017734 -0.013553 1.000000 0.032842 0.065554 0.136530
retweet_count 0.018127 -0.020164 0.032842 1.000000 0.925425 0.022508
favorite_count 0.015940 -0.027449 0.065554 0.925425 1.000000 0.021719
rating 0.979811 -0.001055 0.136530 0.022508 0.021719 1.000000
In [76]:
print(twitter_archive_master.retweet_count.corr(twitter_archive_master.favorite_count))
sns.regplot(twitter_archive_master.retweet_count, twitter_archive_master.favorite_count);
0.9254252213316151
In [77]:
twitter_archive_master.retweet_count.corr(twitter_archive_master.favorite_count)
Out[77]:
0.9254252213316151

From the table and regression plot above, retweet_count and favorite_count have strong positive correlation.

In [ ]: